(spark-docker) branch master updated: [SPARK-47206][FOLLOWUP] Fix wrong path version

2024-02-28 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 4f2d96a  [SPARK-47206][FOLLOWUP] Fix wrong path version
4f2d96a is described below

commit 4f2d96a415c89cfe0fde89a55e9034d095224c94
Author: Yikun Jiang 
AuthorDate: Thu Feb 29 09:49:01 2024 +0800

[SPARK-47206][FOLLOWUP] Fix wrong path version

### What changes were proposed in this pull request?
Fix wrong path version.

### Why are the changes needed?
This will be used by https://github.com/docker-library/official-images .

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

```
$ tools/manifest.py manifest

Maintainers: Apache Spark Developers  (ApacheSpark)
GitRepo: https://github.com/apache/spark-docker.git

Tags: 3.5.1-scala2.12-java17-python3-ubuntu, 3.5.1-java17-python3, 
3.5.1-java17, python3-java17
Architectures: amd64, arm64v8
GitCommit: 8b4329162bbbd1ce5c9d885a1edcd6d61ebcc676
Directory: ./3.5.1/scala2.12-java17-python3-ubuntu

Tags: 3.5.1-scala2.12-java17-r-ubuntu, 3.5.1-java17-r
Architectures: amd64, arm64v8
GitCommit: 8b4329162bbbd1ce5c9d885a1edcd6d61ebcc676
Directory: ./3.5.1/scala2.12-java17-r-ubuntu

Tags: 3.5.1-scala2.12-java17-ubuntu, 3.5.1-java17-scala
Architectures: amd64, arm64v8
GitCommit: 8b4329162bbbd1ce5c9d885a1edcd6d61ebcc676
Directory: ./3.5.1/scala2.12-java17-ubuntu

Tags: 3.5.1-scala2.12-java17-python3-r-ubuntu
Architectures: amd64, arm64v8
GitCommit: 8b4329162bbbd1ce5c9d885a1edcd6d61ebcc676
Directory: ./3.5.1/scala2.12-java17-python3-r-ubuntu

Tags: 3.5.1-scala2.12-java11-python3-ubuntu, 3.5.1-python3, 3.5.1, python3, 
latest
Architectures: amd64, arm64v8
GitCommit: 8b4329162bbbd1ce5c9d885a1edcd6d61ebcc676
Directory: ./3.5.1/scala2.12-java11-python3-ubuntu

Tags: 3.5.1-scala2.12-java11-r-ubuntu, 3.5.1-r, r
Architectures: amd64, arm64v8
GitCommit: 8b4329162bbbd1ce5c9d885a1edcd6d61ebcc676
Directory: ./3.5.1/scala2.12-java11-r-ubuntu

Tags: 3.5.1-scala2.12-java11-ubuntu, 3.5.1-scala, scala
Architectures: amd64, arm64v8
GitCommit: 8b4329162bbbd1ce5c9d885a1edcd6d61ebcc676
Directory: ./3.5.1/scala2.12-java11-ubuntu

Tags: 3.5.1-scala2.12-java11-python3-r-ubuntu
Architectures: amd64, arm64v8
GitCommit: 8b4329162bbbd1ce5c9d885a1edcd6d61ebcc676
Directory: ./3.5.1/scala2.12-java11-python3-r-ubuntu
```

Closes #60 from Yikun/3.5.1-follow.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 versions.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/versions.json b/versions.json
index 3d3e3b9..6ea6d71 100644
--- a/versions.json
+++ b/versions.json
@@ -30,7 +30,7 @@
   ]
 },
 {
-  "path": "3.5.0/scala2.12-java11-python3-ubuntu",
+  "path": "3.5.1/scala2.12-java11-python3-ubuntu",
   "tags": [
 "3.5.1-scala2.12-java11-python3-ubuntu",
 "3.5.1-python3",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark-docker) branch master updated: [SPARK-47206] Add official image Dockerfile for Apache Spark 3.5.1

2024-02-28 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 7216374  [SPARK-47206] Add official image Dockerfile for Apache Spark 
3.5.1
7216374 is described below

commit 7216374855ba57ce14c8ddbf56890538f678ec3d
Author: Yikun Jiang 
AuthorDate: Thu Feb 29 08:55:47 2024 +0800

[SPARK-47206] Add official image Dockerfile for Apache Spark 3.5.1

### What changes were proposed in this pull request?
Add Apache Spark 3.5.1 Dockerfiles.

- Add 3.5.1 GPG key
- Add .github/workflows/build_3.5.1.yaml
- `./add-dockerfiles.sh 3.5.1` to generate dockerfiles
- Add version and tag info

### Why are the changes needed?
Apache Spark 3.5.1 released

### Does this PR introduce _any_ user-facing change?
Docker image will be published.

### How was this patch tested?
Add workflow and CI passed

Closes #59 from Yikun/3.5.1.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/build_3.5.1.yaml |  43 +++
 .github/workflows/publish.yml  |   4 +-
 .github/workflows/test.yml |   3 +-
 3.5.1/scala2.12-java11-python3-r-ubuntu/Dockerfile |  29 +
 3.5.1/scala2.12-java11-python3-ubuntu/Dockerfile   |  26 +
 3.5.1/scala2.12-java11-r-ubuntu/Dockerfile |  28 +
 3.5.1/scala2.12-java11-ubuntu/Dockerfile   |  79 +
 3.5.1/scala2.12-java11-ubuntu/entrypoint.sh| 130 +
 3.5.1/scala2.12-java17-python3-r-ubuntu/Dockerfile |  29 +
 3.5.1/scala2.12-java17-python3-ubuntu/Dockerfile   |  26 +
 3.5.1/scala2.12-java17-r-ubuntu/Dockerfile |  28 +
 3.5.1/scala2.12-java17-ubuntu/Dockerfile   |  79 +
 3.5.1/scala2.12-java17-ubuntu/entrypoint.sh| 130 +
 tools/template.py  |   4 +-
 versions.json  |  74 ++--
 15 files changed, 699 insertions(+), 13 deletions(-)

diff --git a/.github/workflows/build_3.5.1.yaml 
b/.github/workflows/build_3.5.1.yaml
new file mode 100644
index 000..65a8d5d
--- /dev/null
+++ b/.github/workflows/build_3.5.1.yaml
@@ -0,0 +1,43 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Build and Test (3.5.1)"
+
+on:
+  pull_request:
+branches:
+  - 'master'
+paths:
+  - '3.5.1/**'
+
+jobs:
+  run-build:
+strategy:
+  matrix:
+image-type: ["all", "python", "scala", "r"]
+java: [11, 17]
+name: Run
+secrets: inherit
+uses: ./.github/workflows/main.yml
+with:
+  spark: 3.5.1
+  scala: 2.12
+  java: ${{ matrix.java }}
+  image-type: ${{ matrix.image-type }}
+
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
index 2f828a4..5dfc210 100644
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -25,10 +25,10 @@ on:
   spark:
 description: 'The Spark version of Spark image.'
 required: true
-default: '3.5.0'
+default: '3.5.1'
 type: choice
 options:
-- 3.5.0
+- 3.5.1
   publish:
 description: 'Publish the image or not.'
 default: false
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index df79364..9c08b33 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -25,9 +25,10 @@ on:
   spark:
 description: 'The Spark version of Spark image.'
 required: true
-default: '3.5.0'
+default: '3.5.1'
 type: choice
 options:
+- 3.5.1
 - 3.5.0
 - 3.4.2
 - 3.4.1
diff --git a/3.5.1/scala2.12-java11-python3-r-ubuntu/Dockerfile 
b/3.5.1/scala2.12-java11-python3-r-ubuntu/Dockerfile
new file mode 100644
index 000..57c044b
--- /dev/null
+++ b/3.5.1/scala2.12-java11-python3-r-ubuntu/Dockerfile
@@ -0,0 +1,29 @@
+#
+# Licensed to the Apa

(spark-docker) branch master updated: [SPARK-46209] Add java 11 only yml for version before 3.5

2023-12-02 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 431aa51  [SPARK-46209] Add java 11 only yml for version before 3.5
431aa51 is described below

commit 431aa516ba58985c902bf2d2a07bf0eaa1df6740
Author: Yikun Jiang 
AuthorDate: Sat Dec 2 20:36:29 2023 +0800

[SPARK-46209] Add java 11 only yml for version before 3.5

### What changes were proposed in this pull request?
Add Java11 only workflow for version before 3.5.0.

### Why are the changes needed?
otherwise, the publish will failed due to no java 17 file founded in 
version before v 3.5.0.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Test on my repo: 
https://github.com/Yikun/spark-docker/actions/workflows/publish-java11.yml

Closes #58 from Yikun/java11-publish.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/{publish.yml => publish-java11.yml} | 9 -
 .github/workflows/publish.yml | 7 ---
 2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/.github/workflows/publish.yml 
b/.github/workflows/publish-java11.yml
similarity index 96%
copy from .github/workflows/publish.yml
copy to .github/workflows/publish-java11.yml
index ec0d66c..caa3702 100644
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish-java11.yml
@@ -17,7 +17,7 @@
 # under the License.
 #
 
-name: "Publish"
+name: "Publish (Java 11 only)"
 
 on:
   workflow_dispatch:
@@ -25,10 +25,9 @@ on:
   spark:
 description: 'The Spark version of Spark image.'
 required: true
-default: '3.5.0'
+default: '3.4.2'
 type: choice
 options:
-- 3.5.0
 - 3.4.2
 - 3.4.1
 - 3.4.0
@@ -59,7 +58,7 @@ jobs:
 strategy:
   matrix:
 scala: [2.12]
-java: [11, 17]
+java: [11]
 image-type: ["scala"]
 permissions:
   packages: write
@@ -81,7 +80,7 @@ jobs:
 strategy:
   matrix:
 scala: [2.12]
-java: [11, 17]
+java: [11]
 image-type: ["all", "python", "r"]
 permissions:
   packages: write
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
index ec0d66c..2f828a4 100644
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -29,13 +29,6 @@ on:
 type: choice
 options:
 - 3.5.0
-- 3.4.2
-- 3.4.1
-- 3.4.0
-- 3.3.3
-- 3.3.2
-- 3.3.1
-- 3.3.0
   publish:
 description: 'Publish the image or not.'
 default: false


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark-docker) branch master updated: [SPARK-46185] Add official image Dockerfile for Apache Spark 3.4.2

2023-12-01 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new ec69b9c  [SPARK-46185] Add official image Dockerfile for Apache Spark 
3.4.2
ec69b9c is described below

commit ec69b9c77bc733ed5937f5068d23f7407eb51ea9
Author: Yikun Jiang 
AuthorDate: Sat Dec 2 10:00:48 2023 +0800

[SPARK-46185] Add official image Dockerfile for Apache Spark 3.4.2

### What changes were proposed in this pull request?
Add Apache Spark 3.4.2 Dockerfiles.

- Add 3.4.2 GPG key
- Add .github/workflows/build_3.4.2.yaml
- `./add-dockerfiles.sh 3.4.2` to generate dockerfiles (and remove master 
changes: 
https://github.com/apache/spark-docker/pull/55/commits/24cbf40abdc252fdcf48303efa33ba7f84adefaf)
- Add version and tag info

### Why are the changes needed?
Apache Spark 3.4.2 released

### Does this PR introduce _any_ user-facing change?
Docker image will be published.

### How was this patch tested?
Add workflow and CI passed

Closes #57 from Yikun/3.4.2.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/build_3.4.2.yaml |  41 +++
 .github/workflows/publish.yml  |   1 +
 .github/workflows/test.yml |   1 +
 3.4.2/scala2.12-java11-python3-r-ubuntu/Dockerfile |  29 +
 3.4.2/scala2.12-java11-python3-ubuntu/Dockerfile   |  26 +
 3.4.2/scala2.12-java11-r-ubuntu/Dockerfile |  28 +
 3.4.2/scala2.12-java11-ubuntu/Dockerfile   |  79 +
 3.4.2/scala2.12-java11-ubuntu/entrypoint.sh| 126 +
 tools/template.py  |   2 +
 versions.json  |  28 +
 10 files changed, 361 insertions(+)

diff --git a/.github/workflows/build_3.4.2.yaml 
b/.github/workflows/build_3.4.2.yaml
new file mode 100644
index 000..8ae17d1
--- /dev/null
+++ b/.github/workflows/build_3.4.2.yaml
@@ -0,0 +1,41 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Build and Test (3.4.2)"
+
+on:
+  pull_request:
+branches:
+  - 'master'
+paths:
+  - '3.4.2/**'
+
+jobs:
+  run-build:
+strategy:
+  matrix:
+image-type: ["all", "python", "scala", "r"]
+name: Run
+secrets: inherit
+uses: ./.github/workflows/main.yml
+with:
+  spark: 3.4.2
+  scala: 2.12
+  java: 11
+  image-type: ${{ matrix.image-type }}
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
index 879a9c2..ec0d66c 100644
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -29,6 +29,7 @@ on:
 type: choice
 options:
 - 3.5.0
+- 3.4.2
 - 3.4.1
 - 3.4.0
 - 3.3.3
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index 689981a..df79364 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -29,6 +29,7 @@ on:
 type: choice
 options:
 - 3.5.0
+- 3.4.2
 - 3.4.1
 - 3.4.0
 - 3.3.3
diff --git a/3.4.2/scala2.12-java11-python3-r-ubuntu/Dockerfile 
b/3.4.2/scala2.12-java11-python3-r-ubuntu/Dockerfile
new file mode 100644
index 000..7c7e96a
--- /dev/null
+++ b/3.4.2/scala2.12-java11-python3-r-ubuntu/Dockerfile
@@ -0,0 +1,29 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS"

(spark-docker) branch master updated: Add support for java 17 from spark 3.5.0

2023-11-09 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 6f68fe0  Add support for java 17 from spark 3.5.0
6f68fe0 is described below

commit 6f68fe0f7051c10f2bf43a50a7decfce2e97baf0
Author: vakarisbk 
AuthorDate: Fri Nov 10 11:33:39 2023 +0800

Add support for java 17 from spark 3.5.0

### What changes were proposed in this pull request?
1. Create Java17 base images alongside Java11 images starting from spark 
3.5.0
2. Change ubuntu version to 22.04 for `scala2.12-java17-*`

### Why are the changes needed?

Spark supports multiple Java versions, but the images are currently built 
only with Java 11.

### Does this PR introduce _any_ user-facing change?

New images would be available in the repositories.

### How was this patch tested?

Closes #56 from vakarisbk/master.

Authored-by: vakarisbk 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/build_3.5.0.yaml |   3 +-
 .github/workflows/main.yml |  20 +++-
 .github/workflows/publish.yml  |   4 +-
 .github/workflows/test.yml |   3 +
 3.5.0/scala2.12-java17-python3-r-ubuntu/Dockerfile |  29 +
 3.5.0/scala2.12-java17-python3-ubuntu/Dockerfile   |  26 +
 3.5.0/scala2.12-java17-r-ubuntu/Dockerfile |  28 +
 3.5.0/scala2.12-java17-ubuntu/Dockerfile   |  79 +
 3.5.0/scala2.12-java17-ubuntu/entrypoint.sh| 130 +
 add-dockerfiles.sh |  23 +++-
 tools/ci_runner_cleaner/free_disk_space.sh |  53 +
 .../ci_runner_cleaner/free_disk_space_container.sh |  33 ++
 tools/template.py  |   2 +-
 versions.json  |  29 +
 14 files changed, 454 insertions(+), 8 deletions(-)

diff --git a/.github/workflows/build_3.5.0.yaml 
b/.github/workflows/build_3.5.0.yaml
index 6eb3ad6..9f2b2d6 100644
--- a/.github/workflows/build_3.5.0.yaml
+++ b/.github/workflows/build_3.5.0.yaml
@@ -31,11 +31,12 @@ jobs:
 strategy:
   matrix:
 image-type: ["all", "python", "scala", "r"]
+java: [11, 17]
 name: Run
 secrets: inherit
 uses: ./.github/workflows/main.yml
 with:
   spark: 3.5.0
   scala: 2.12
-  java: 11
+  java: ${{ matrix.java }}
   image-type: ${{ matrix.image-type }}
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index fe755ed..145b529 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -79,6 +79,14 @@ jobs:
   - name: Checkout Spark Docker repository
 uses: actions/checkout@v3
 
+  - name: Free up disk space
+shell: 'script -q -e -c "bash {0}"'
+run: |
+  chmod +x tools/ci_runner_cleaner/free_disk_space_container.sh
+  tools/ci_runner_cleaner/free_disk_space_container.sh
+  chmod +x tools/ci_runner_cleaner/free_disk_space.sh
+  tools/ci_runner_cleaner/free_disk_space.sh
+
   - name: Prepare - Generate tags
 run: |
   case "${{ inputs.image-type }}" in
@@ -195,7 +203,8 @@ jobs:
   - name : Test - Run spark application for standalone cluster on docker
 run: testing/run_tests.sh --image-url $IMAGE_URL --scala-version ${{ 
inputs.scala }} --spark-version ${{ inputs.spark }}
 
-  - name: Test - Checkout Spark repository
+  - name: Test - Checkout Spark repository for Spark 3.3.0 (with 
fetch-depth 0)
+if: inputs.spark == '3.3.0'
 uses: actions/checkout@v3
 with:
   fetch-depth: 0
@@ -203,6 +212,14 @@ jobs:
   ref: v${{ inputs.spark }}
   path: ${{ github.workspace }}/spark
 
+  - name: Test - Checkout Spark repository 
+if: inputs.spark != '3.3.0'
+uses: actions/checkout@v3
+with:
+  repository: apache/spark
+  ref: v${{ inputs.spark }}
+  path: ${{ github.workspace }}/spark 
+
   - name: Test - Cherry pick commits
 # Apache Spark enable resource limited k8s IT since v3.3.1, 
cherry-pick patches for old release
 # https://github.com/apache/spark/pull/36087#issuecomment-1251756266
@@ -247,6 +264,7 @@ jobs:
   # TODO(SPARK-44495): Resume to use the latest minikube for 
k8s-integration-tests.
   curl -LO 
https://storage.googleapis.com/minikube/releases/v1.30.1/minikube-linux-amd64
   sudo install minikube-linux-amd64 /usr/local/bin/minikube
+  rm minikube-linux-amd64
   # Github Action limit cpu:2, memory: 6947MB, limit to 2U6G for 
better resource statistic
   minikube start --cpus 2 --memory 6144
 
diff --git a/.github

[spark-docker] branch master updated: [SPARK-45169] Add official image Dockerfile for Apache Spark 3.5.0

2023-09-14 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 028efd4  [SPARK-45169] Add official image Dockerfile for Apache Spark 
3.5.0
028efd4 is described below

commit 028efd4637fb2cf791d5bd9ea70b2fca472de4b7
Author: Yikun Jiang 
AuthorDate: Thu Sep 14 21:22:32 2023 +0800

[SPARK-45169] Add official image Dockerfile for Apache Spark 3.5.0

### What changes were proposed in this pull request?
Add Apache Spark 3.5.0 Dockerfiles.

- Add 3.5.0 GPG key
- Add .github/workflows/build_3.5.0.yaml
- `./add-dockerfiles.sh 3.5.0` to generate dockerfiles
- Add version and tag info
- Backport 
https://github.com/apache/spark/commit/1d2c338c867c69987d8ed1f3666358af54a040e3 
and 
https://github.com/apache/spark/commit/0c7b4306c7c5fbdd6c54f8172f82e1d23e3b 
entrypoint changes

### Why are the changes needed?
Apache Spark 3.5.0 released

### Does this PR introduce _any_ user-facing change?
Docker image will be published.

### How was this patch tested?
Add workflow and CI passed

Closes #55 from Yikun/3.5.0.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/build_3.5.0.yaml | 41 +++
 .github/workflows/publish.yml  |  3 +-
 .github/workflows/test.yml |  3 +-
 3.5.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 29 
 3.5.0/scala2.12-java11-python3-ubuntu/Dockerfile   | 26 +++
 3.5.0/scala2.12-java11-r-ubuntu/Dockerfile | 28 
 3.5.0/scala2.12-java11-ubuntu/Dockerfile   | 79 ++
 .../scala2.12-java11-ubuntu/entrypoint.sh  |  4 ++
 entrypoint.sh.template |  4 ++
 tools/template.py  |  4 +-
 versions.json  | 42 ++--
 11 files changed, 253 insertions(+), 10 deletions(-)

diff --git a/.github/workflows/build_3.5.0.yaml 
b/.github/workflows/build_3.5.0.yaml
new file mode 100644
index 000..6eb3ad6
--- /dev/null
+++ b/.github/workflows/build_3.5.0.yaml
@@ -0,0 +1,41 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Build and Test (3.5.0)"
+
+on:
+  pull_request:
+branches:
+  - 'master'
+paths:
+  - '3.5.0/**'
+
+jobs:
+  run-build:
+strategy:
+  matrix:
+image-type: ["all", "python", "scala", "r"]
+name: Run
+secrets: inherit
+uses: ./.github/workflows/main.yml
+with:
+  spark: 3.5.0
+  scala: 2.12
+  java: 11
+  image-type: ${{ matrix.image-type }}
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
index d213ada..8cfa95d 100644
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -25,9 +25,10 @@ on:
   spark:
 description: 'The Spark version of Spark image.'
 required: true
-default: '3.4.1'
+default: '3.5.0'
 type: choice
 options:
+- 3.5.0
 - 3.4.1
 - 3.4.0
 - 3.3.3
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index 4f0f741..47dac20 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -25,9 +25,10 @@ on:
   spark:
 description: 'The Spark version of Spark image.'
 required: true
-default: '3.4.1'
+default: '3.5.0'
 type: choice
 options:
+- 3.5.0
 - 3.4.1
 - 3.4.0
 - 3.3.3
diff --git a/3.5.0/scala2.12-java11-python3-r-ubuntu/Dockerfile 
b/3.5.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
new file mode 100644
index 000..d6faaa7
--- /dev/null
+++ b/3.5.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
@@ -0,0 +1,29 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.

[spark-docker] branch master updated: [SPARK-44494] Pin minikube to v1.30.1 to fix spark-docker K8s CI

2023-08-17 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 6fd201e  [SPARK-44494] Pin minikube to v1.30.1 to fix spark-docker K8s 
CI
6fd201e is described below

commit 6fd201e7c6e6a36c7a18e3b5877c3616081a05cf
Author: Yikun Jiang 
AuthorDate: Thu Aug 17 15:30:59 2023 +0800

[SPARK-44494] Pin minikube to v1.30.1 to fix spark-docker K8s CI

### What changes were proposed in this pull request?
Pin minikube to v1.30.1 to fix spark-docker K8s CI.

### Why are the changes needed?
Pin minikube to v1.30.1 to fix spark-docker K8s CI

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes #53 from Yikun/minikube.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/main.yml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 870c8c7..fe755ed 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -243,7 +243,9 @@ jobs:
   - name: Test - Start minikube
 run: |
   # See more in "Installation" https://minikube.sigs.k8s.io/docs/start/
-  curl -LO 
https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
+  # curl -LO 
https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
+  # TODO(SPARK-44495): Resume to use the latest minikube for 
k8s-integration-tests.
+  curl -LO 
https://storage.googleapis.com/minikube/releases/v1.30.1/minikube-linux-amd64
   sudo install minikube-linux-amd64 /usr/local/bin/minikube
   # Github Action limit cpu:2, memory: 6947MB, limit to 2U6G for 
better resource statistic
   minikube start --cpus 2 --memory 6144


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-40513] Add --batch to gpg command

2023-06-29 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 58d2885  [SPARK-40513] Add --batch to gpg command
58d2885 is described below

commit 58d288546e8419d229f14b62b6a653999e0390f1
Author: Yikun Jiang 
AuthorDate: Thu Jun 29 16:05:47 2023 +0800

[SPARK-40513] Add --batch to gpg command

### What changes were proposed in this pull request?
Add --batch to gpg command which essentially puts GnuPG into "API mode" 
instead of "UI mode".
Apply changes to 3.4.x dockerfile.

### Why are the changes needed?
Address DOI comments: 
https://github.com/docker-library/official-images/pull/13089#issuecomment-1611814491

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes #51 from Yikun/batch.
    
    Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 3.4.0/scala2.12-java11-ubuntu/Dockerfile | 4 ++--
 3.4.1/scala2.12-java11-ubuntu/Dockerfile | 4 ++--
 Dockerfile.template  | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/3.4.0/scala2.12-java11-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-ubuntu/Dockerfile
index 854f86c..a4b081e 100644
--- a/3.4.0/scala2.12-java11-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-ubuntu/Dockerfile
@@ -46,8 +46,8 @@ RUN set -ex; \
 wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \
 wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \
 export GNUPGHOME="$(mktemp -d)"; \
-gpg --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \
-gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; \
+gpg --batch --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \
+gpg --batch --keyserver hkps://keyserver.ubuntu.com --recv-keys 
"$GPG_KEY"; \
 gpg --batch --verify spark.tgz.asc spark.tgz; \
 gpgconf --kill all; \
 rm -rf "$GNUPGHOME" spark.tgz.asc; \
diff --git a/3.4.1/scala2.12-java11-ubuntu/Dockerfile 
b/3.4.1/scala2.12-java11-ubuntu/Dockerfile
index 6d62769..d8bba7e 100644
--- a/3.4.1/scala2.12-java11-ubuntu/Dockerfile
+++ b/3.4.1/scala2.12-java11-ubuntu/Dockerfile
@@ -46,8 +46,8 @@ RUN set -ex; \
 wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \
 wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \
 export GNUPGHOME="$(mktemp -d)"; \
-gpg --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \
-gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; \
+gpg --batch --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \
+gpg --batch --keyserver hkps://keyserver.ubuntu.com --recv-keys 
"$GPG_KEY"; \
 gpg --batch --verify spark.tgz.asc spark.tgz; \
 gpgconf --kill all; \
 rm -rf "$GNUPGHOME" spark.tgz.asc; \
diff --git a/Dockerfile.template b/Dockerfile.template
index 80b57e2..3d0aacf 100644
--- a/Dockerfile.template
+++ b/Dockerfile.template
@@ -46,8 +46,8 @@ RUN set -ex; \
 wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \
 wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \
 export GNUPGHOME="$(mktemp -d)"; \
-gpg --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \
-gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; \
+gpg --batch --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \
+gpg --batch --keyserver hkps://keyserver.ubuntu.com --recv-keys 
"$GPG_KEY"; \
 gpg --batch --verify spark.tgz.asc spark.tgz; \
 gpgconf --kill all; \
 rm -rf "$GNUPGHOME" spark.tgz.asc; \


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-44168][FOLLOWUP] Change v3.4 GPG_KEY to full key fingerprint

2023-06-29 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 39264c5  [SPARK-44168][FOLLOWUP] Change v3.4 GPG_KEY to full key 
fingerprint
39264c5 is described below

commit 39264c502cf21b71a1ab5da71760e5864abce099
Author: Yikun Jiang 
AuthorDate: Thu Jun 29 16:04:50 2023 +0800

[SPARK-44168][FOLLOWUP] Change v3.4 GPG_KEY to full key fingerprint

### What changes were proposed in this pull request?

Change GPG key from `34F0FC5C` to 
`F28C9C925C188C35E345614DEDA00CE834F0FC5C` to avoid pontential collision.

The full finger print can get from below cmd:
```
$ wget https://dist.apache.org/repos/dist/dev/spark/KEYS
$ gpg --import KEYS
$ gpg --fingerprint 34F0FC5C

pub   rsa4096 2015-05-05 [SC]
  F28C 9C92 5C18 8C35 E345  614D EDA0 0CE8 34F0 FC5C
uid   [ unknown] Dongjoon Hyun (CODE SIGNING KEY) 

sub   rsa4096 2015-05-05 [E]

```

### Why are the changes needed?

- A short gpg key had been added as v3.4.0 gpg key in 
https://github.com/apache/spark-docker/pull/46 .
- The short key `34F0FC5C` is from 
https://dist.apache.org/repos/dist/dev/spark/KEYS
- According DOI review comments, 
https://github.com/docker-library/official-images/pull/13089#issuecomment-1609990551
 , `this should be the full key fingerprint: 
F28C9C925C188C35E345614DEDA00CE834F0FC5C (generating a collision for such a 
short key ID is trivial.`
- We'd better to switch the short key to full fingerprint

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes #50 from Yikun/gpg_key.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 3.4.1/scala2.12-java11-ubuntu/Dockerfile | 2 +-
 tools/template.py| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/3.4.1/scala2.12-java11-ubuntu/Dockerfile 
b/3.4.1/scala2.12-java11-ubuntu/Dockerfile
index bf106a6..6d62769 100644
--- a/3.4.1/scala2.12-java11-ubuntu/Dockerfile
+++ b/3.4.1/scala2.12-java11-ubuntu/Dockerfile
@@ -38,7 +38,7 @@ RUN set -ex; \
 # https://downloads.apache.org/spark/KEYS
 ENV 
SPARK_TGZ_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3.tgz
 \
 
SPARK_TGZ_ASC_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3.tgz.asc
 \
-GPG_KEY=34F0FC5C
+GPG_KEY=F28C9C925C188C35E345614DEDA00CE834F0FC5C
 
 RUN set -ex; \
 export SPARK_TMP="$(mktemp -d)"; \
diff --git a/tools/template.py b/tools/template.py
index 93e842a..cdc167c 100755
--- a/tools/template.py
+++ b/tools/template.py
@@ -31,7 +31,7 @@ GPG_KEY_DICT = {
 # issuer "xinr...@apache.org"
 "3.4.0": "CC68B3D16FE33A766705160BA7E57908C7A4E1B1",
 # issuer "dongj...@apache.org"
-"3.4.1": "34F0FC5C"
+"3.4.1": "F28C9C925C188C35E345614DEDA00CE834F0FC5C"
 }
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-40513][DOCS] Add apache/spark docker image overview

2023-06-27 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new d02ff60  [SPARK-40513][DOCS] Add apache/spark docker image overview
d02ff60 is described below

commit d02ff6091835311a32c7ccc73d8ebae1d5817ecc
Author: Yikun Jiang 
AuthorDate: Tue Jun 27 14:28:21 2023 +0800

[SPARK-40513][DOCS] Add apache/spark docker image overview

### What changes were proposed in this pull request?
This PR add the `OVERVIEW.md`.

### Why are the changes needed?

This will be used in the page of https://hub.docker.com/r/apache/spark to 
introduce the spark docker image and tag info.

### Does this PR introduce _any_ user-facing change?
Yes, doc only

### How was this patch tested?
Doc only, review.

Closes #34 from Yikun/overview.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 OVERVIEW.md | 83 +
 1 file changed, 83 insertions(+)

diff --git a/OVERVIEW.md b/OVERVIEW.md
new file mode 100644
index 000..046
--- /dev/null
+++ b/OVERVIEW.md
@@ -0,0 +1,83 @@
+# What is Apache Spark™?
+
+Apache Spark™ is a multi-language engine for executing data engineering, data 
science, and machine learning on single-node machines or clusters. It provides 
high-level APIs in Scala, Java, Python, and R, and an optimized engine that 
supports general computation graphs for data analysis. It also supports a rich 
set of higher-level tools including Spark SQL for SQL and DataFrames, pandas 
API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph 
processing, and Structu [...]
+
+https://spark.apache.org/
+
+## Online Documentation
+
+You can find the latest Spark documentation, including a programming guide, on 
the [project web page](https://spark.apache.org/documentation.html). This 
README file only contains basic setup instructions.
+
+## Interactive Scala Shell
+
+The easiest way to start using Spark is through the Scala shell:
+
+```
+docker run -it apache/spark /opt/spark/bin/spark-shell
+```
+
+Try the following command, which should return 1,000,000,000:
+
+```
+scala> spark.range(1000 * 1000 * 1000).count()
+```
+
+## Interactive Python Shell
+
+The easiest way to start using PySpark is through the Python shell:
+
+```
+docker run -it apache/spark /opt/spark/bin/pyspark
+```
+
+And run the following command, which should also return 1,000,000,000:
+
+```
+>>> spark.range(1000 * 1000 * 1000).count()
+```
+
+## Interactive R Shell
+
+The easiest way to start using R on Spark is through the R shell:
+
+```
+docker run -it apache/spark:r /opt/spark/bin/sparkR
+```
+
+## Running Spark on Kubernetes
+
+https://spark.apache.org/docs/latest/running-on-kubernetes.html
+
+## Supported tags and respective Dockerfile links
+
+Currently, the `apache/spark` docker image supports 4 types for each version:
+
+Such as for v3.4.0:
+- [3.4.0-scala2.12-java11-python3-ubuntu, 3.4.0-python3, 3.4.0, python3, 
latest](https://github.com/apache/spark-docker/tree/fe05e38f0ffad271edccd6ae40a77d5f14f3eef7/3.4.0/scala2.12-java11-python3-ubuntu)
+- [3.4.0-scala2.12-java11-r-ubuntu, 3.4.0-r, 
r](https://github.com/apache/spark-docker/tree/fe05e38f0ffad271edccd6ae40a77d5f14f3eef7/3.4.0/scala2.12-java11-r-ubuntu)
+- [3.4.0-scala2.12-java11-ubuntu, 3.4.0-scala, 
scala](https://github.com/apache/spark-docker/tree/fe05e38f0ffad271edccd6ae40a77d5f14f3eef7/3.4.0/scala2.12-java11-ubuntu)
+- 
[3.4.0-scala2.12-java11-python3-r-ubuntu](https://github.com/apache/spark-docker/tree/fe05e38f0ffad271edccd6ae40a77d5f14f3eef7/3.4.0/scala2.12-java11-python3-r-ubuntu)
+
+## Environment Variable
+
+The environment variables of entrypoint.sh are listed below:
+
+| Environment Variable | Meaning |
+|--|---|
+| SPARK_EXTRA_CLASSPATH | The extra path to be added to the classpath, see 
also in 
https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management
 |
+| PYSPARK_PYTHON | Python binary executable to use for PySpark in both driver 
and workers (default is python3 if available, otherwise python). Property 
spark.pyspark.python take precedence if it is set |
+| PYSPARK_DRIVER_PYTHON | Python binary executable to use for PySpark in 
driver only (default is PYSPARK_PYTHON). Property spark.pyspark.driver.python 
take precedence if it is set |
+| SPARK_DIST_CLASSPATH | Distribution-defined classpath to add to processes |
+| SPARK_DRIVER_BIND_ADDRESS | Hostname or IP address where to bind listening 
sockets. See also `spark.driver.bindAddress` |
+| SPARK_EXECUTOR_JAVA_OPTS | The Java opts of Spark Executor |
+| SPARK_APPLICATION_ID | A unique identifier for the Spark application |
+| SPARK_EXECUTOR_POD_IP | The Pod IP address of spark executor |
+| SPARK_RESOURC

[spark-docker] branch master updated: [SPARK-44175] Remove useless lib64 path link in dockerfile

2023-06-27 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 5405b49  [SPARK-44175] Remove useless lib64 path link in dockerfile
5405b49 is described below

commit 5405b49b52aa1661d31ac80cdb8c9aad530d6847
Author: Yikun Jiang 
AuthorDate: Tue Jun 27 14:09:34 2023 +0800

[SPARK-44175] Remove useless lib64 path link in dockerfile

### What changes were proposed in this pull request?
Remove useless lib64 path

### Why are the changes needed?
Address comments: 
https://github.com/docker-library/official-images/pull/13089#issuecomment-1601813499

It was introduced by 
https://github.com/apache/spark/commit/f13ea15d79fb4752a0a75a05a4a89bd8625ea3d5 
to address the issue about snappy on alpine OS, but we already switch the OS to 
ubuntu, so `/lib64` hack can be cleanup.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes #48 from Yikun/rm-lib64-hack.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 3.4.0/scala2.12-java11-ubuntu/Dockerfile | 1 -
 3.4.1/scala2.12-java11-ubuntu/Dockerfile | 1 -
 Dockerfile.template  | 1 -
 3 files changed, 3 deletions(-)

diff --git a/3.4.0/scala2.12-java11-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-ubuntu/Dockerfile
index 77ace47..854f86c 100644
--- a/3.4.0/scala2.12-java11-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-ubuntu/Dockerfile
@@ -23,7 +23,6 @@ RUN groupadd --system --gid=${spark_uid} spark && \
 
 RUN set -ex; \
 apt-get update; \
-ln -s /lib /lib64; \
 apt-get install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools gosu libnss-wrapper; \
 mkdir -p /opt/spark; \
 mkdir /opt/spark/python; \
diff --git a/3.4.1/scala2.12-java11-ubuntu/Dockerfile 
b/3.4.1/scala2.12-java11-ubuntu/Dockerfile
index e782686..bf106a6 100644
--- a/3.4.1/scala2.12-java11-ubuntu/Dockerfile
+++ b/3.4.1/scala2.12-java11-ubuntu/Dockerfile
@@ -23,7 +23,6 @@ RUN groupadd --system --gid=${spark_uid} spark && \
 
 RUN set -ex; \
 apt-get update; \
-ln -s /lib /lib64; \
 apt-get install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools gosu libnss-wrapper; \
 mkdir -p /opt/spark; \
 mkdir /opt/spark/python; \
diff --git a/Dockerfile.template b/Dockerfile.template
index 6fedce9..80b57e2 100644
--- a/Dockerfile.template
+++ b/Dockerfile.template
@@ -23,7 +23,6 @@ RUN groupadd --system --gid=${spark_uid} spark && \
 
 RUN set -ex; \
 apt-get update; \
-ln -s /lib /lib64; \
 apt-get install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools gosu libnss-wrapper; \
 mkdir -p /opt/spark; \
 mkdir /opt/spark/python; \


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-44177] Add 'set -eo pipefail' to entrypoint and quote variables

2023-06-26 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 6022289  [SPARK-44177] Add 'set -eo pipefail' to entrypoint and quote 
variables
6022289 is described below

commit 60222892836549f05c56edd49ac81c688c8e7356
Author: Yikun Jiang 
AuthorDate: Tue Jun 27 08:59:03 2023 +0800

[SPARK-44177] Add 'set -eo pipefail' to entrypoint and quote variables

### What changes were proposed in this pull request?
Add 'set -eo pipefail' to entrypoint and quote variables

### Why are the changes needed?
Address DOI comments:
1. Have you considered a set -eo pipefail on the entrypoint script to help 
prevent any errors from being silently ignored?
2. You probably want to quote this (and many of the other variables in this 
execution); ala --driver-url "$SPARK_DRIVER_URL"

[1] 
https://github.com/docker-library/official-images/pull/13089#issuecomment-1601334895
[2] 
https://github.com/docker-library/official-images/pull/13089#issuecomment-1601813499

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes #49 from Yikun/quote.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 3.4.0/scala2.12-java11-ubuntu/entrypoint.sh | 31 -
 3.4.1/scala2.12-java11-ubuntu/entrypoint.sh | 31 -
 entrypoint.sh.template  | 31 -
 3 files changed, 51 insertions(+), 42 deletions(-)

diff --git a/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh 
b/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh
index 08fc925..2e3d2a8 100755
--- a/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh
+++ b/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh
@@ -15,6 +15,9 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
+# Prevent any errors from being silently ignored
+set -eo pipefail
+
 attempt_setup_fake_passwd_entry() {
   # Check whether there is a passwd entry for the container UID
   local myuid; myuid="$(id -u)"
@@ -51,10 +54,10 @@ if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
   SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
 fi
 
-if ! [ -z ${PYSPARK_PYTHON+x} ]; then
+if ! [ -z "${PYSPARK_PYTHON+x}" ]; then
 export PYSPARK_PYTHON
 fi
-if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then
+if ! [ -z "${PYSPARK_DRIVER_PYTHON+x}" ]; then
 export PYSPARK_DRIVER_PYTHON
 fi
 
@@ -64,13 +67,13 @@ if [ -n "${HADOOP_HOME}"  ] && [ -z 
"${SPARK_DIST_CLASSPATH}"  ]; then
   export SPARK_DIST_CLASSPATH="$($HADOOP_HOME/bin/hadoop classpath)"
 fi
 
-if ! [ -z ${HADOOP_CONF_DIR+x} ]; then
+if ! [ -z "${HADOOP_CONF_DIR+x}" ]; then
   SPARK_CLASSPATH="$HADOOP_CONF_DIR:$SPARK_CLASSPATH";
 fi
 
-if ! [ -z ${SPARK_CONF_DIR+x} ]; then
+if ! [ -z "${SPARK_CONF_DIR+x}" ]; then
   SPARK_CLASSPATH="$SPARK_CONF_DIR:$SPARK_CLASSPATH";
-elif ! [ -z ${SPARK_HOME+x} ]; then
+elif ! [ -z "${SPARK_HOME+x}" ]; then
   SPARK_CLASSPATH="$SPARK_HOME/conf:$SPARK_CLASSPATH";
 fi
 
@@ -99,17 +102,17 @@ case "$1" in
 CMD=(
   ${JAVA_HOME}/bin/java
   "${SPARK_EXECUTOR_JAVA_OPTS[@]}"
-  -Xms$SPARK_EXECUTOR_MEMORY
-  -Xmx$SPARK_EXECUTOR_MEMORY
+  -Xms"$SPARK_EXECUTOR_MEMORY"
+  -Xmx"$SPARK_EXECUTOR_MEMORY"
   -cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH"
   org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBackend
-  --driver-url $SPARK_DRIVER_URL
-  --executor-id $SPARK_EXECUTOR_ID
-  --cores $SPARK_EXECUTOR_CORES
-  --app-id $SPARK_APPLICATION_ID
-  --hostname $SPARK_EXECUTOR_POD_IP
-  --resourceProfileId $SPARK_RESOURCE_PROFILE_ID
-  --podName $SPARK_EXECUTOR_POD_NAME
+  --driver-url "$SPARK_DRIVER_URL"
+  --executor-id "$SPARK_EXECUTOR_ID"
+  --cores "$SPARK_EXECUTOR_CORES"
+  --app-id "$SPARK_APPLICATION_ID"
+  --hostname "$SPARK_EXECUTOR_POD_IP"
+  --resourceProfileId "$SPARK_RESOURCE_PROFILE_ID"
+  --podName "$SPARK_EXECUTOR_POD_NAME"
 )
 attempt_setup_fake_passwd_entry
 # Execute the container CMD under tini for better hygiene
diff --git a/3.4.1/scala2.12-java11-ubuntu/entrypoint.sh 
b/3.4.1/scala2.12-java11-ubuntu/entrypoint.sh
index 08fc925..2e3d2a8 100755
--- a/3.4.1/scala2.12-java11-ubuntu/entrypoint.sh
+++ b/3.4.1/scala2.12-java11-ubuntu/entrypoint.sh
@@ -15,6 +15,9 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
+# Prevent any errors fr

[spark-docker] branch master updated: [SPARK-44176] Change apt to apt-get and remove useless cleanup

2023-06-26 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 6f1a0a5  [SPARK-44176] Change apt to apt-get and remove useless cleanup
6f1a0a5 is described below

commit 6f1a0a5fbb8034ebc4ea04e4f0b2fda728a4dd1e
Author: Yikun Jiang 
AuthorDate: Tue Jun 27 08:56:54 2023 +0800

[SPARK-44176] Change apt to apt-get and remove useless cleanup

### What changes were proposed in this pull request?
This patch change `apt` to `apt-get` and also remove useless `rm -rf 
/var/cache/apt/*; \`.
And also apply the change to 3.4.0 and 3.4.1

### Why are the changes needed?
Address comments from DOI:
- `apt install ...`, This should be apt-get (apt is not intended for 
unattended use, as the warning during build makes clear).
- `rm -rf /var/cache/apt/*; \` This is harmless, but should be unnecessary 
(the base image configuration already makes sure this directory stays empty).

See more in:
[1] 
https://github.com/docker-library/official-images/pull/13089#issuecomment-1601813499

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes #47 from Yikun/apt-get.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 5 ++---
 3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile   | 3 +--
 3.4.0/scala2.12-java11-r-ubuntu/Dockerfile | 3 +--
 3.4.0/scala2.12-java11-ubuntu/Dockerfile   | 3 +--
 3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile | 5 ++---
 3.4.1/scala2.12-java11-python3-ubuntu/Dockerfile   | 3 +--
 3.4.1/scala2.12-java11-r-ubuntu/Dockerfile | 3 +--
 3.4.1/scala2.12-java11-ubuntu/Dockerfile   | 3 +--
 Dockerfile.template| 3 +--
 r-python.template  | 5 ++---
 10 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
index 0f1962f..10aa23e 100644
--- a/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
@@ -20,9 +20,8 @@ USER root
 
 RUN set -ex; \
 apt-get update; \
-apt install -y python3 python3-pip; \
-apt install -y r-base r-base-dev; \
-rm -rf /var/cache/apt/*; \
+apt-get install -y python3 python3-pip; \
+apt-get install -y r-base r-base-dev; \
 rm -rf /var/lib/apt/lists/*
 
 ENV R_HOME /usr/lib/R
diff --git a/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile
index 258d806..3240e57 100644
--- a/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile
@@ -20,8 +20,7 @@ USER root
 
 RUN set -ex; \
 apt-get update; \
-apt install -y python3 python3-pip; \
-rm -rf /var/cache/apt/*; \
+apt-get install -y python3 python3-pip; \
 rm -rf /var/lib/apt/lists/*
 
 USER spark
diff --git a/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile
index 4c928c6..266392f 100644
--- a/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile
@@ -20,8 +20,7 @@ USER root
 
 RUN set -ex; \
 apt-get update; \
-apt install -y r-base r-base-dev; \
-rm -rf /var/cache/apt/*; \
+apt-get install -y r-base r-base-dev; \
 rm -rf /var/lib/apt/lists/*
 
 ENV R_HOME /usr/lib/R
diff --git a/3.4.0/scala2.12-java11-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-ubuntu/Dockerfile
index aa754b7..77ace47 100644
--- a/3.4.0/scala2.12-java11-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-ubuntu/Dockerfile
@@ -24,7 +24,7 @@ RUN groupadd --system --gid=${spark_uid} spark && \
 RUN set -ex; \
 apt-get update; \
 ln -s /lib /lib64; \
-apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools gosu libnss-wrapper; \
+apt-get install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools gosu libnss-wrapper; \
 mkdir -p /opt/spark; \
 mkdir /opt/spark/python; \
 mkdir -p /opt/spark/examples; \
@@ -33,7 +33,6 @@ RUN set -ex; \
 touch /opt/spark/RELEASE; \
 chown -R spark:spark /opt/spark; \
 echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su; \
-rm -rf /var/cache/apt/*; \
 rm -rf /var/lib/apt/lists/*
 
 # Install Apache Spark
diff --git a/3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile 
b/3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile
index 95c98b9..30e6b86 100644
--- a/3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile
+++ b/3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile
@@ -20,9 +20,8 @@ USER root
 
 RUN set -ex; \
 apt-get updat

[spark-docker] branch master updated: [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles

2023-06-24 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 6f36415  [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles
6f36415 is described below

commit 6f3641534a97a80491cba926cc7a5e67972494ea
Author: Yikun Jiang 
AuthorDate: Sun Jun 25 10:51:46 2023 +0800

[SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles

### What changes were proposed in this pull request?
Add Apache Spark 3.4.1 Dockerfiles.
- Add 3.4.1 GPG key
- Add .github/workflows/build_3.4.1.yaml
- ./add-dockerfiles.sh 3.4.1
- Add version and tag info

### Why are the changes needed?
Apache Spark 3.4.1 released:
https://spark.apache.org/releases/spark-release-3-4-1.html

### Does this PR introduce _any_ user-facing change?
Docker image will be published.

### How was this patch tested?
Add workflow and CI passed

Closes #46 from Yikun/3.4.1.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/build_3.4.1.yaml |  41 +++
 .github/workflows/publish.yml  |   3 +-
 .github/workflows/test.yml |   3 +-
 3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile |  30 +
 3.4.1/scala2.12-java11-python3-ubuntu/Dockerfile   |  27 +
 3.4.1/scala2.12-java11-r-ubuntu/Dockerfile |  29 +
 3.4.1/scala2.12-java11-ubuntu/Dockerfile   |  81 ++
 3.4.1/scala2.12-java11-ubuntu/entrypoint.sh| 123 +
 tools/template.py  |   2 +
 versions.json  |  42 +--
 10 files changed, 372 insertions(+), 9 deletions(-)

diff --git a/.github/workflows/build_3.4.1.yaml 
b/.github/workflows/build_3.4.1.yaml
new file mode 100644
index 000..2eba18e
--- /dev/null
+++ b/.github/workflows/build_3.4.1.yaml
@@ -0,0 +1,41 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Build and Test (3.4.1)"
+
+on:
+  pull_request:
+branches:
+  - 'master'
+paths:
+  - '3.4.1/**'
+
+jobs:
+  run-build:
+strategy:
+  matrix:
+image-type: ["all", "python", "scala", "r"]
+name: Run
+secrets: inherit
+uses: ./.github/workflows/main.yml
+with:
+  spark: 3.4.1
+  scala: 2.12
+  java: 11
+  image-type: ${{ matrix.image-type }}
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
index 3063bfe..1138a9f 100644
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -25,9 +25,10 @@ on:
   spark:
 description: 'The Spark version of Spark image.'
 required: true
-default: '3.4.0'
+default: '3.4.1'
 type: choice
 options:
+- 3.4.1
 - 3.4.0
 - 3.3.2
 - 3.3.1
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index 06e2321..4136f1c 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -25,9 +25,10 @@ on:
   spark:
 description: 'The Spark version of Spark image.'
 required: true
-default: '3.4.0'
+default: '3.4.1'
 type: choice
 options:
+- 3.4.1
 - 3.4.0
 - 3.3.2
 - 3.3.1
diff --git a/3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile 
b/3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile
new file mode 100644
index 000..95c98b9
--- /dev/null
+++ b/3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile
@@ -0,0 +1,30 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICEN

[spark-docker] branch master updated: [SPARK-43368] Use `libnss_wrapper` to fake passwd entry

2023-06-01 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new c07ae18  [SPARK-43368] Use `libnss_wrapper` to fake passwd entry
c07ae18 is described below

commit c07ae18355678370fd270bedb8b39ab2aceb5ac2
Author: Yikun Jiang 
AuthorDate: Fri Jun 2 10:27:01 2023 +0800

[SPARK-43368] Use `libnss_wrapper` to fake passwd entry

### What changes were proposed in this pull request?
Use `libnss_wrapper` to fake passwd entry instead of changing passwd to 
resolve random UID problem. And also we only attempt to setup fake passwd entry 
for driver/executor, but for cmd like `bash`, the fake passwd will not be set.

### Why are the changes needed?
In the past, we add the entry to  `/etc/passwd` directly for current UID, 
it's mainly for [OpenShift anonymous random `uid` 
case](https://github.com/docker-library/official-images/pull/13089#issuecomment-1534706523)
 (See also in https://github.com/apache-spark-on-k8s/spark/pull/404), but this 
way bring the pontential security issue about widely permision of `/etc/passwd`.

According to DOI reviewer 
[suggestion](https://github.com/docker-library/official-images/pull/13089#issuecomment-1561793792),
 we'd better to resolve this problem by using 
[libnss_wrapper](https://cwrap.org/nss_wrapper.html). It's a library to help 
set a fake passwd entry by setting `LD_PRELOAD`, `NSS_WRAPPER_PASSWD`, 
`NSS_WRAPPER_GROUP`. Such as random UID is `1000`, the env will be:

```
spark6f41b8e5be9b:/opt/spark/work-dir$ id -u
1000
spark6f41b8e5be9b:/opt/spark/work-dir$ id -g
1000
spark6f41b8e5be9b:/opt/spark/work-dir$ whoami
spark
spark6f41b8e5be9b:/opt/spark/work-dir$ echo $LD_PRELOAD
/usr/lib/libnss_wrapper.so
spark6f41b8e5be9b:/opt/spark/work-dir$ echo $NSS_WRAPPER_PASSWD
/tmp/tmp.r5x4SMX35B
spark6f41b8e5be9b:/opt/spark/work-dir$ cat /tmp/tmp.r5x4SMX35B
spark:x:1000:1000:${SPARK_USER_NAME:-anonymous uid}:/opt/spark:/bin/false
spark6f41b8e5be9b:/opt/spark/work-dir$ echo $NSS_WRAPPER_GROUP
/tmp/tmp.XcnnYuD68r
spark6f41b8e5be9b:/opt/spark/work-dir$ cat /tmp/tmp.XcnnYuD68r
spark:x:1000:
```

### Does this PR introduce _any_ user-facing change?
Yes, setup fake ENV rather than changing `/etc/passwd`.

### How was this patch tested?
 1. Without `attempt_setup_fake_passwd_entry`, the user is `I have no 
name!`
```
# docker run -it --rm --user 1000:1000  spark-test bash
groups: cannot find name for group ID 1000
I have no name!998110cd5a26:/opt/spark/work-dir$
I have no name!0fea1d27d67d:/opt/spark/work-dir$ id -u
1000
I have no name!0fea1d27d67d:/opt/spark/work-dir$ id -g
1000
I have no name!0fea1d27d67d:/opt/spark/work-dir$ whoami
whoami: cannot find name for user ID 1000
```

 2. Mannual stub the `attempt_setup_fake_passwd_entry`, the user is 
`spark`.
2.1 Apply a tmp change to cmd

```patch
diff --git a/entrypoint.sh.template b/entrypoint.sh.template
index 08fc925..77d5b04 100644
--- a/entrypoint.sh.template
+++ b/entrypoint.sh.template
 -118,6 +118,7  case "$1" in

   *)
 # Non-spark-on-k8s command provided, proceeding in pass-through mode...
+attempt_setup_fake_passwd_entry
 exec "$"
 ;;
 esac
```

2.2 Build and run the image, specify a random UID/GID 1000

```bash
$ docker build . -t spark-test
$ docker run -it --rm --user 1000:1000  spark-test bash
# the user is set to spark rather than unknow user
spark6f41b8e5be9b:/opt/spark/work-dir$
spark6f41b8e5be9b:/opt/spark/work-dir$ id -u
1000
spark6f41b8e5be9b:/opt/spark/work-dir$ id -g
1000
spark6f41b8e5be9b:/opt/spark/work-dir$ whoami
spark

```

```
# NSS env is set right
spark6f41b8e5be9b:/opt/spark/work-dir$ echo $LD_PRELOAD
/usr/lib/libnss_wrapper.so
spark6f41b8e5be9b:/opt/spark/work-dir$ echo $NSS_WRAPPER_PASSWD
/tmp/tmp.r5x4SMX35B
spark6f41b8e5be9b:/opt/spark/work-dir$ cat /tmp/tmp.r5x4SMX35B
spark:x:1000:1000:${SPARK_USER_NAME:-anonymous uid}:/opt/spark:/bin/false
spark6f41b8e5be9b:/opt/spark/work-dir$ echo $NSS_WRAPPER_GROUP
/tmp/tmp.XcnnYuD68r
spark6f41b8e5be9b:/opt/spark/work-dir$ cat /tmp/tmp.XcnnYuD68r
spark:x:1000:
```

 3. If specify current exsiting user (such as `spark`, `root`), no fake 
setup
```bash
# docker run -it --rm --user 0  spark-test bash
roote5bf55d4df22:/opt/spark/work-dir# echo $LD_PRELOAD

```

```bash
# docker run -it --rm  spark-test bash
sparkdef8d8ca4e7d:/opt/spark/work-dir$ echo $LD_PRELOAD

```

Closes #45 from Yikun/SPARK-43368.
    
    Auth

[spark-docker] branch master updated: [SPARK-43370] Switch spark user only when run driver and executor

2023-06-01 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 2dc12d9  [SPARK-43370] Switch spark user only when run driver and 
executor
2dc12d9 is described below

commit 2dc12d96910710aa6ee2d717c4c723ddd75127a1
Author: Yikun Jiang 
AuthorDate: Thu Jun 1 14:36:17 2023 +0800

[SPARK-43370] Switch spark user only when run driver and executor

### What changes were proposed in this pull request?
Switch spark user only when run driver and executor

### Why are the changes needed?
Address doi comments: question 7 [1]

[1] 
https://github.com/docker-library/official-images/pull/13089#issuecomment-1533540388
[2] 
https://github.com/docker-library/official-images/pull/13089#issuecomment-1561793792

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
1. test mannuly
```
cd ~/spark-docker/3.4.0/scala2.12-java11-ubuntu
$ docker build . -t spark-test

$ docker run -ti spark-test bash
sparkafa78af05cf8:/opt/spark/work-dir$

$ docker run  --user root  -ti spark-test bash
root095e0d7651fd:/opt/spark/work-dir#
```
2. ci passed

Closes: https://github.com/apache/spark-docker/pull/44

Closes #43 from Yikun/SPARK-43370.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile |  4 
 3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile   |  4 
 3.4.0/scala2.12-java11-r-ubuntu/Dockerfile |  4 
 3.4.0/scala2.12-java11-ubuntu/Dockerfile   |  2 ++
 3.4.0/scala2.12-java11-ubuntu/entrypoint.sh| 23 +++---
 Dockerfile.template|  2 ++
 entrypoint.sh.template | 23 +++---
 r-python.template  |  4 
 8 files changed, 44 insertions(+), 22 deletions(-)

diff --git a/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
index 7734100..0f1962f 100644
--- a/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
@@ -16,6 +16,8 @@
 #
 FROM spark:3.4.0-scala2.12-java11-ubuntu
 
+USER root
+
 RUN set -ex; \
 apt-get update; \
 apt install -y python3 python3-pip; \
@@ -24,3 +26,5 @@ RUN set -ex; \
 rm -rf /var/lib/apt/lists/*
 
 ENV R_HOME /usr/lib/R
+
+USER spark
diff --git a/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile
index 6c12c30..258d806 100644
--- a/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile
@@ -16,8 +16,12 @@
 #
 FROM spark:3.4.0-scala2.12-java11-ubuntu
 
+USER root
+
 RUN set -ex; \
 apt-get update; \
 apt install -y python3 python3-pip; \
 rm -rf /var/cache/apt/*; \
 rm -rf /var/lib/apt/lists/*
+
+USER spark
diff --git a/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile
index 24cd41a..4c928c6 100644
--- a/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile
@@ -16,6 +16,8 @@
 #
 FROM spark:3.4.0-scala2.12-java11-ubuntu
 
+USER root
+
 RUN set -ex; \
 apt-get update; \
 apt install -y r-base r-base-dev; \
@@ -23,3 +25,5 @@ RUN set -ex; \
 rm -rf /var/lib/apt/lists/*
 
 ENV R_HOME /usr/lib/R
+
+USER spark
diff --git a/3.4.0/scala2.12-java11-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-ubuntu/Dockerfile
index 205b399..a680106 100644
--- a/3.4.0/scala2.12-java11-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-ubuntu/Dockerfile
@@ -77,4 +77,6 @@ ENV SPARK_HOME /opt/spark
 
 WORKDIR /opt/spark/work-dir
 
+USER spark
+
 ENTRYPOINT [ "/opt/entrypoint.sh" ]
diff --git a/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh 
b/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh
index 716f1af..6def3f9 100755
--- a/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh
+++ b/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh
@@ -69,6 +69,13 @@ elif ! [ -z ${SPARK_HOME+x} ]; then
   SPARK_CLASSPATH="$SPARK_HOME/conf:$SPARK_CLASSPATH";
 fi
 
+# Switch to spark if no USER specified (root by default) otherwise use USER 
directly
+switch_spark_if_root() {
+  if [ $(id -u) -eq 0 ]; then
+echo gosu spark
+  fi
+}
+
 case "$1" in
   driver)
 shift 1
@@ -78,6 +85,8 @@ case "$1" in
   --deploy-mode client
   "$@"
 )
+# Execute the container CMD under tini for better hygiene
+exec $(switch_spark_if_root) /usr/bin/tini -s -- "${CMD[@]}"
 ;;
   executor)
 shift 1
@@ -96,20 +105,12 @@ case "$1" in
   --resourceProfileId $SPARK_RESOURCE_PROFILE_ID
   --podName $SPARK_EXECUTOR_POD_NAME

[spark-docker] branch master updated: [SPARK-43806] Add awesome-spark-docker.md

2023-05-25 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 9d4c98c  [SPARK-43806] Add awesome-spark-docker.md
9d4c98c is described below

commit 9d4c98c62c4ce517e69e65d1f6f7bf412d775b75
Author: Yikun Jiang 
AuthorDate: Fri May 26 09:53:20 2023 +0800

[SPARK-43806] Add awesome-spark-docker.md

### What changes were proposed in this pull request?
Add links to more related images and dockerfile reference.

### Why are the changes needed?
Something we talked about in "Spark on Kube Coffe Chats“[1] to add links to 
more related images and dockerfile reference. Init with [2].
[1] https://lists.apache.org/thread/26gpmlhqhk5cp2fhtzrpl5f61p8jc551
[2] 
https://github.com/awesome-spark/awesome-spark/blob/main/README.md#docker-images

### Does this PR introduce _any_ user-facing change?
Doc only

### How was this patch tested?
No

Closes #28 from Yikun/awesome-spark-docker.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 awesome-spark-docker.md | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/awesome-spark-docker.md b/awesome-spark-docker.md
new file mode 100644
index 000..c7bb840
--- /dev/null
+++ b/awesome-spark-docker.md
@@ -0,0 +1,7 @@
+A curated list of awesome Apache Spark Docker resources.
+
+- 
[jupyter/docker-stacks/pyspark-notebook](https://github.com/jupyter/docker-stacks/tree/master/pyspark-notebook)
 - PySpark with Jupyter Notebook.
+- 
[big-data-europe/docker-spark](https://github.com/big-data-europe/docker-spark) 
- The standalone cluster and spark applications related Dockerfiles.
+- 
[openeuler/spark](https://github.com/openeuler-mirror/openeuler-docker-images/tree/master/spark)
 - Dockerfile reference for dnf/yum based OS.
+- 
[GoogleCloudPlatform/spark-on-k8s-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
 - Kubernetes operator for managing the lifecycle of Apache Spark applications 
on Kubernetes.
+


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-43367] Recover sh in dockerfile

2023-05-25 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new ce3e122  [SPARK-43367] Recover sh in dockerfile
ce3e122 is described below

commit ce3e12266ef82264b814f6f7823165f7c7ae215a
Author: Yikun Jiang 
AuthorDate: Thu May 25 19:07:55 2023 +0800

[SPARK-43367] Recover sh in dockerfile

### What changes were proposed in this pull request?
Recover `sh`, we remove `sh` due to 
https://github.com/apache-spark-on-k8s/spark/pull/444/files#r134075892 , now 
`SPARK_DRIVER_JAVA_OPTS` related code already move to `entrypoint.sh` with 
`#!/bin/bash`, so we don't need this hack way.

See also:
[1] 
https://github.com/docker-library/official-images/pull/13089#issuecomment-1533540388
[2] 
https://github.com/docker-library/official-images/pull/13089#issuecomment-1561793792

### Why are the changes needed?
Recover sh

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes #41 from Yikun/SPARK-43367.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 3.4.0/scala2.12-java11-ubuntu/Dockerfile | 2 --
 Dockerfile.template  | 2 --
 2 files changed, 4 deletions(-)

diff --git a/3.4.0/scala2.12-java11-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-ubuntu/Dockerfile
index 11f997f..205b399 100644
--- a/3.4.0/scala2.12-java11-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-ubuntu/Dockerfile
@@ -32,8 +32,6 @@ RUN set -ex; \
 chmod g+w /opt/spark/work-dir; \
 touch /opt/spark/RELEASE; \
 chown -R spark:spark /opt/spark; \
-rm /bin/sh; \
-ln -sv /bin/bash /bin/sh; \
 echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su; \
 chgrp root /etc/passwd && chmod ug+rw /etc/passwd; \
 rm -rf /var/cache/apt/*; \
diff --git a/Dockerfile.template b/Dockerfile.template
index 6e85cd3..8b13e4a 100644
--- a/Dockerfile.template
+++ b/Dockerfile.template
@@ -32,8 +32,6 @@ RUN set -ex; \
 chmod g+w /opt/spark/work-dir; \
 touch /opt/spark/RELEASE; \
 chown -R spark:spark /opt/spark; \
-rm /bin/sh; \
-ln -sv /bin/bash /bin/sh; \
 echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su; \
 chgrp root /etc/passwd && chmod ug+rw /etc/passwd; \
 rm -rf /var/cache/apt/*; \


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-43793] Fix SPARK_EXECUTOR_JAVA_OPTS assignment bug

2023-05-25 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 006e8fa  [SPARK-43793] Fix SPARK_EXECUTOR_JAVA_OPTS assignment bug
006e8fa is described below

commit 006e8fade69f148a05fc73f591f52c7678e48f04
Author: Yikun Jiang 
AuthorDate: Thu May 25 19:05:26 2023 +0800

[SPARK-43793] Fix SPARK_EXECUTOR_JAVA_OPTS assignment bug

### What changes were proposed in this pull request?
In previous code, this is susceptible to a few bugs particularly around 
newlines in values.
```
env | grep SPARK_JAVA_OPT_ | sort -t_ -k4 -n | sed 's/[^=]*=\(.*\)/\1/g' > 
/tmp/java_opts.txt
readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt
```

### Why are the changes needed?
To address DOI comments: 
https://github.com/docker-library/official-images/pull/13089#issuecomment-1533540388
 , question 6.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
1. Test mannully
```
export SPARK_JAVA_OPT_0="foo=bar"
export SPARK_JAVA_OPT_1="foo1=bar1"

for v in "${!SPARK_JAVA_OPT_}"; do
SPARK_EXECUTOR_JAVA_OPTS+=( "${!v}" )
done

for v in ${SPARK_EXECUTOR_JAVA_OPTS[]}; do
echo $v
done

# foo=bar
# foo1=bar1
```
    2. CI passed

Closes #42 from Yikun/SPARK-43793.

    Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 3.4.0/scala2.12-java11-ubuntu/entrypoint.sh | 5 +++--
 entrypoint.sh.template  | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh 
b/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh
index 4bb1557..716f1af 100755
--- a/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh
+++ b/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh
@@ -38,8 +38,9 @@ if [ -z "$JAVA_HOME" ]; then
 fi
 
 SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*"
-env | grep SPARK_JAVA_OPT_ | sort -t_ -k4 -n | sed 's/[^=]*=\(.*\)/\1/g' > 
/tmp/java_opts.txt
-readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt
+for v in "${!SPARK_JAVA_OPT_@}"; do
+SPARK_EXECUTOR_JAVA_OPTS+=( "${!v}" )
+done
 
 if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
   SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
diff --git a/entrypoint.sh.template b/entrypoint.sh.template
index 4bb1557..716f1af 100644
--- a/entrypoint.sh.template
+++ b/entrypoint.sh.template
@@ -38,8 +38,9 @@ if [ -z "$JAVA_HOME" ]; then
 fi
 
 SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*"
-env | grep SPARK_JAVA_OPT_ | sort -t_ -k4 -n | sed 's/[^=]*=\(.*\)/\1/g' > 
/tmp/java_opts.txt
-readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt
+for v in "${!SPARK_JAVA_OPT_@}"; do
+SPARK_EXECUTOR_JAVA_OPTS+=( "${!v}" )
+done
 
 if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
   SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-43365][FOLLWUP] Refactor publish workflow based on base image

2023-05-25 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new f2d2b2d  [SPARK-43365][FOLLWUP] Refactor publish workflow based on 
base image
f2d2b2d is described below

commit f2d2b2d1ffbb951aed29221a040861327c09441e
Author: Yikun Jiang 
AuthorDate: Thu May 25 16:13:44 2023 +0800

[SPARK-43365][FOLLWUP] Refactor publish workflow based on base image

### What changes were proposed in this pull request?
- This patch changes the `build-args` to `patch in test` in build and 
publish workflow, because the docker official image do not support 
**parameterized FROM** values. 
https://github.com/docker-library/official-images/pull/13089#issuecomment-1555352902
- And also Refactor publish workflow:

![image](https://user-images.githubusercontent.com/1736354/236613626-96f8fbf6-7df7-4d10-b4fb-be4d57c56dce.png)
### Why are the changes needed?
Same change with build workflow refactor, to avoid the publish issue like:
```
#5 [linux/amd64 internal] load metadata for 
docker.io/library/spark:3.4.0-scala2.12-java11-ubuntu
#5 ERROR: pull access denied, repository does not exist or may require 
authorization: server message: insufficient_scope: authorization failed
--
 > [linux/amd64 internal] load metadata for 
docker.io/library/spark:3.4.0-scala2.12-java11-ubuntu:
--
Dockerfile:18

  16 | #
  17 | ARG BASE_IMAGE=spark:3.4.0-scala2.12-java11-ubuntu
  18 | >>> FROM $BASE_IMAGE
  19 |
  20 | RUN set -ex && \

ERROR: failed to solve: spark:3.4.0-scala2.12-java11-ubuntu: pull access 
denied, repository does not exist or may require authorization: server message: 
insufficient_scope: authorization failed
Error: buildx failed with: ERROR: failed to solve: 
spark:3.4.0-scala2.12-java11-ubuntu: pull access denied, repository does not 
exist or may require authorization: server message: insufficient_scope: 
authorization failed
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Publish test in my local fork:
- 
https://github.com/Yikun/spark-docker/actions/runs/5076986823/jobs/9120029759: 
Skip the local base build use the [published 
base](https://github.com/Yikun/spark-docker/actions/runs/5076986823/jobs/9120029759#step:11:135)
 image:


![image](https://user-images.githubusercontent.com/1736354/236612540-2b454c14-e194-4d73-b859-0df001570d27.png)

```
#3 [linux/amd64 internal] load metadata for 
ghcr.io/yikun/spark-docker/spark:3.4.0-scala2.12-java11-ubuntu
#3 DONE 0.9s

#4 [linux/arm64 internal] load metadata for 
ghcr.io/yikun/spark-docker/spark:3.4.0-scala2.12-java11-ubuntu
#4 DONE 0.9s
```

- CI passed: do local base build first and build base on the local build

Closes #39 from Yikun/publish-build.
    
    Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/main.yml | 21 --
 .github/workflows/publish.yml  | 25 +-
 3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile |  3 +--
 3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile   |  3 +--
 3.4.0/scala2.12-java11-r-ubuntu/Dockerfile |  3 +--
 r-python.template  |  3 +--
 6 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index c1d0c56..870c8c7 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -107,6 +107,9 @@ jobs:
 TEST_REPO=${{ inputs.repository }}
 UNIQUE_IMAGE_TAG=${{ inputs.image-tag }}
   fi
+
+  # We can't use the real image for build because we haven't publish 
the image yet.
+  # The base image for build, it's something like 
localhost:5000/$REPO_OWNER/spark-docker/spark:3.3.0-scala2.12-java11-ubuntu
   BASE_IMAGE_URL=$TEST_REPO/$IMAGE_NAME:$BASE_IMGAE_TAG
   IMAGE_URL=$TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG
 
@@ -157,7 +160,8 @@ jobs:
   driver-opts: network=host
 
   - name: Build - Build the base image
-if: ${{ inputs.build }}
+# Don't need to build the base image when publish
+if: ${{ inputs.build && !inputs.publish }}
 uses: docker/build-push-action@v3
 with:
   context: ${{ env.BASE_IMAGE_PATH }}
@@ -165,11 +169,24 @@ jobs:
   platforms: linux/amd64,linux/arm64
   push: true
 
+  - name: Build - Use the test image repo when build
+# Don't need to build the base image when publish
+if: ${{ inputs.build && !inputs.publish }}
+working-directory

[spark-docker] branch master updated: [SPARK-43372] Use ; instead of && when enable set -ex

2023-05-07 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 7f9b414  [SPARK-43372] Use ; instead of && when enable set -ex
7f9b414 is described below

commit 7f9b414de48639d69c64acfd81e6792517b86f61
Author: Yikun Jiang 
AuthorDate: Mon May 8 11:19:36 2023 +0800

[SPARK-43372] Use ; instead of && when enable set -ex

### What changes were proposed in this pull request?
- Use ; instead of && when enable set -ex
- ./add-dockerfiles.sh 3.4.0 to apply changes

### Why are the changes needed?
Address DOI comments: `9. using set -ex means you can use ; instead of && 
(really only matters for complex expressions, like the || in the later RUN that 
does use ;)`


https://github.com/docker-library/official-images/pull/13089#issuecomment-1533540388

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes #38 from Yikun/SPARK-43372.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 10 +++
 3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile   |  8 +++---
 3.4.0/scala2.12-java11-r-ubuntu/Dockerfile |  8 +++---
 3.4.0/scala2.12-java11-ubuntu/Dockerfile   | 32 +++---
 Dockerfile.template| 32 +++---
 r-python.template  | 10 +++
 6 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
index 86337c5..12c7a4f 100644
--- a/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
@@ -17,11 +17,11 @@
 ARG BASE_IMAGE=spark:3.4.0-scala2.12-java11-ubuntu
 FROM $BASE_IMAGE
 
-RUN set -ex && \
-apt-get update && \
-apt install -y python3 python3-pip && \
-apt install -y r-base r-base-dev && \
-rm -rf /var/cache/apt/* && \
+RUN set -ex; \
+apt-get update; \
+apt install -y python3 python3-pip; \
+apt install -y r-base r-base-dev; \
+rm -rf /var/cache/apt/*; \
 rm -rf /var/lib/apt/lists/*
 
 ENV R_HOME /usr/lib/R
diff --git a/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile
index 540805f..1f0dd1f 100644
--- a/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile
@@ -17,8 +17,8 @@
 ARG BASE_IMAGE=spark:3.4.0-scala2.12-java11-ubuntu
 FROM $BASE_IMAGE
 
-RUN set -ex && \
-apt-get update && \
-apt install -y python3 python3-pip && \
-rm -rf /var/cache/apt/* && \
+RUN set -ex; \
+apt-get update; \
+apt install -y python3 python3-pip; \
+rm -rf /var/cache/apt/*; \
 rm -rf /var/lib/apt/lists/*
diff --git a/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile
index c65c2ce..53647b2 100644
--- a/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile
@@ -17,10 +17,10 @@
 ARG BASE_IMAGE=spark:3.4.0-scala2.12-java11-ubuntu
 FROM $BASE_IMAGE
 
-RUN set -ex && \
-apt-get update && \
-apt install -y r-base r-base-dev && \
-rm -rf /var/cache/apt/* && \
+RUN set -ex; \
+apt-get update; \
+apt install -y r-base r-base-dev; \
+rm -rf /var/cache/apt/*; \
 rm -rf /var/lib/apt/lists/*
 
 ENV R_HOME /usr/lib/R
diff --git a/3.4.0/scala2.12-java11-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-ubuntu/Dockerfile
index 21d95d4..11f997f 100644
--- a/3.4.0/scala2.12-java11-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-ubuntu/Dockerfile
@@ -21,22 +21,22 @@ ARG spark_uid=185
 RUN groupadd --system --gid=${spark_uid} spark && \
 useradd --system --uid=${spark_uid} --gid=spark spark
 
-RUN set -ex && \
-apt-get update && \
-ln -s /lib /lib64 && \
-apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools gosu && \
-mkdir -p /opt/spark && \
-mkdir /opt/spark/python && \
-mkdir -p /opt/spark/examples && \
-mkdir -p /opt/spark/work-dir && \
-chmod g+w /opt/spark/work-dir && \
-touch /opt/spark/RELEASE && \
-chown -R spark:spark /opt/spark && \
-rm /bin/sh && \
-ln -sv /bin/bash /bin/sh && \
-echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
-chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
-rm -rf /var/cache/apt/* && \
+RUN set -ex; \
+   

[spark-docker] branch master updated: [SPARK-43371] Minimize duplication across layers for chmod

2023-05-06 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 406eb86  [SPARK-43371] Minimize duplication across layers for chmod
406eb86 is described below

commit 406eb86c2cc722458e0a4787e759802dda5c73eb
Author: Yikun Jiang 
AuthorDate: Sat May 6 17:24:12 2023 +0800

[SPARK-43371] Minimize duplication across layers for chmod

### What changes were proposed in this pull request?
This patch minimizes duplication across layers for chmod:
- Move `chmod g+w /opt/spark/work-dir` to layer of `/opt/spark/work-dir` 
creation
- Move `chmod a+x /opt/decom.sh` to layer of spark extration layer.
- `chmod a+x $VERSION/$TAG/entrypoint.sh` when generate the entrypoint.sh
- ./add-dockerfiles.sh 3.4.0 to apply changes

### Why are the changes needed?
Address DOI review comments to minimize duplication across layers for chmod
> To minimize duplication across layers, chmod's should be done in the 
layer that creates the file/folder (or in the case of a file from the context 
via COPY, it should have the +x committed to git)


https://github.com/docker-library/official-images/pull/13089#issuecomment-1533540388

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes #37 from Yikun/SPARK-43371.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 3.4.0/scala2.12-java11-ubuntu/Dockerfile| 5 ++---
 3.4.0/scala2.12-java11-ubuntu/entrypoint.sh | 0
 Dockerfile.template | 5 ++---
 add-dockerfiles.sh  | 1 +
 4 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/3.4.0/scala2.12-java11-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-ubuntu/Dockerfile
index 997b8d3..21d95d4 100644
--- a/3.4.0/scala2.12-java11-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-ubuntu/Dockerfile
@@ -29,6 +29,7 @@ RUN set -ex && \
 mkdir /opt/spark/python && \
 mkdir -p /opt/spark/examples && \
 mkdir -p /opt/spark/work-dir && \
+chmod g+w /opt/spark/work-dir && \
 touch /opt/spark/RELEASE && \
 chown -R spark:spark /opt/spark && \
 rm /bin/sh && \
@@ -68,6 +69,7 @@ RUN set -ex; \
 mv python/pyspark /opt/spark/python/pyspark/; \
 mv python/lib /opt/spark/python/lib/; \
 mv R /opt/spark/; \
+chmod a+x /opt/decom.sh; \
 cd ..; \
 rm -rf "$SPARK_TMP";
 
@@ -76,8 +78,5 @@ COPY entrypoint.sh /opt/
 ENV SPARK_HOME /opt/spark
 
 WORKDIR /opt/spark/work-dir
-RUN chmod g+w /opt/spark/work-dir
-RUN chmod a+x /opt/decom.sh
-RUN chmod a+x /opt/entrypoint.sh
 
 ENTRYPOINT [ "/opt/entrypoint.sh" ]
diff --git a/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh 
b/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh
old mode 100644
new mode 100755
diff --git a/Dockerfile.template b/Dockerfile.template
index 5fe4f25..db01a87 100644
--- a/Dockerfile.template
+++ b/Dockerfile.template
@@ -29,6 +29,7 @@ RUN set -ex && \
 mkdir /opt/spark/python && \
 mkdir -p /opt/spark/examples && \
 mkdir -p /opt/spark/work-dir && \
+chmod g+w /opt/spark/work-dir && \
 touch /opt/spark/RELEASE && \
 chown -R spark:spark /opt/spark && \
 rm /bin/sh && \
@@ -68,6 +69,7 @@ RUN set -ex; \
 mv python/pyspark /opt/spark/python/pyspark/; \
 mv python/lib /opt/spark/python/lib/; \
 mv R /opt/spark/; \
+chmod a+x /opt/decom.sh; \
 cd ..; \
 rm -rf "$SPARK_TMP";
 
@@ -76,8 +78,5 @@ COPY entrypoint.sh /opt/
 ENV SPARK_HOME /opt/spark
 
 WORKDIR /opt/spark/work-dir
-RUN chmod g+w /opt/spark/work-dir
-RUN chmod a+x /opt/decom.sh
-RUN chmod a+x /opt/entrypoint.sh
 
 ENTRYPOINT [ "/opt/entrypoint.sh" ]
diff --git a/add-dockerfiles.sh b/add-dockerfiles.sh
index 7dcd7b0..d61601e 100755
--- a/add-dockerfiles.sh
+++ b/add-dockerfiles.sh
@@ -52,6 +52,7 @@ for TAG in $TAGS; do
 if [ "$TAG" == "scala2.12-java11-ubuntu" ]; then
 python3 tools/template.py $OPTS > $VERSION/$TAG/Dockerfile
 python3 tools/template.py $OPTS -f entrypoint.sh.template > 
$VERSION/$TAG/entrypoint.sh
+chmod a+x $VERSION/$TAG/entrypoint.sh
 else
 python3 tools/template.py $OPTS -f r-python.template > 
$VERSION/$TAG/Dockerfile
 fi


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-43365] Refactor Dockerfile and workflow based on base image

2023-05-05 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 7f83637  [SPARK-43365] Refactor Dockerfile and workflow based on base 
image
7f83637 is described below

commit 7f836378d8bfe453b7e1dba304b54cb1cfacda49
Author: Yikun Jiang 
AuthorDate: Sat May 6 09:15:41 2023 +0800

[SPARK-43365] Refactor Dockerfile and workflow based on base image

### What changes were proposed in this pull request?
This PR changes Dockerfile and workflow based on base image to save space 
by sharing layers by having one image from another.

After this PR:
- The spark / PySpark / SparkR related files extract into base image
- Install PySpark / SparkR deps in PySpark / SparkR images.
- Add the base image build step
- Apply changes to template: `./add-dockerfiles.sh 3.4.0` to make it work.
- This PR didn't contain changes on 3.3.X Dockerfiles to make PR more 
clear, the 3.3.x changes will be a separate PR when we address all comments for 
3.4.0.

[1] 
https://github.com/docker-library/official-images/pull/13089?notification_referrer_id=NT_kwDOABp-orI0MzIwMzMwNzY5OjE3MzYzNTQ#issuecomment-1533540388

### Why are the changes needed?
Address DOI comments, and also to save space by sharing layers by having 
one image from another.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed.

Closes #36 from Yikun/official.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/main.yml |  20 
 3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile |  63 +---
 .../entrypoint.sh  | 114 -
 3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile   |  63 +---
 .../scala2.12-java11-python3-ubuntu/entrypoint.sh  | 114 -
 3.4.0/scala2.12-java11-r-ubuntu/Dockerfile |  60 +--
 3.4.0/scala2.12-java11-r-ubuntu/entrypoint.sh  | 107 ---
 3.4.0/scala2.12-java11-ubuntu/Dockerfile   |   4 +
 3.4.0/scala2.12-java11-ubuntu/entrypoint.sh|   7 ++
 Dockerfile.template|  15 ---
 add-dockerfiles.sh |   9 +-
 entrypoint.sh.template |   2 -
 add-dockerfiles.sh => r-python.template|  54 +++---
 tools/template.py  |  16 +++
 14 files changed, 77 insertions(+), 571 deletions(-)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index fd37990..c1d0c56 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -91,10 +91,12 @@ jobs:
   scala) SUFFIX=ubuntu
   ;;
   esac
+  BASE_IMGAE_TAG=${{ inputs.spark }}-scala${{ inputs.scala }}-java${{ 
inputs.java }}-ubuntu
   TAG=scala${{ inputs.scala }}-java${{ inputs.java }}-$SUFFIX
 
   IMAGE_NAME=spark
   IMAGE_PATH=${{ inputs.spark }}/$TAG
+  BASE_IMAGE_PATH=${{ inputs.spark }}/scala${{ inputs.scala }}-java${{ 
inputs.java }}-ubuntu
   if [ "${{ inputs.build }}" == "true" ]; then
 # Use the local registry to build and test
 REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr 
'[:upper:]' '[:lower:]')
@@ -105,6 +107,7 @@ jobs:
 TEST_REPO=${{ inputs.repository }}
 UNIQUE_IMAGE_TAG=${{ inputs.image-tag }}
   fi
+  BASE_IMAGE_URL=$TEST_REPO/$IMAGE_NAME:$BASE_IMGAE_TAG
   IMAGE_URL=$TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG
 
   PUBLISH_REPO=${{ inputs.repository }}
@@ -116,8 +119,12 @@ jobs:
   echo "TEST_REPO=${TEST_REPO}" >> $GITHUB_ENV
   # Image name: spark
   echo "IMAGE_NAME=${IMAGE_NAME}" >> $GITHUB_ENV
+  # Base Image Dockerfile: 3.3.0/scala2.12-java11-ubuntu
+  echo "BASE_IMAGE_PATH=${BASE_IMAGE_PATH}" >> $GITHUB_ENV
   # Image dockerfile path: 3.3.0/scala2.12-java11-python3-ubuntu
   echo "IMAGE_PATH=${IMAGE_PATH}" >> $GITHUB_ENV
+  # Base Image URL: spark:3.3.0-scala2.12-java11-ubuntu
+  echo "BASE_IMAGE_URL=${BASE_IMAGE_URL}" >> $GITHUB_ENV
   # Image URL: 
ghcr.io/apache/spark-docker/spark:3.3.0-scala2.12-java11-python3-ubuntu
   echo "IMAGE_URL=${IMAGE_URL}" >> $GITHUB_ENV
 
@@ -132,6 +139,9 @@ jobs:
   echo "IMAGE_PATH: "${IMAGE_PATH}
   echo "IMAGE_URL: "${IMAGE_URL}
 
+  echo "BASE_IMAGE_PATH: "${BASE_IMAGE_PATH}
+  echo "BASE_IMAGE_URL: "${BASE_IMAGE_URL}
+
 

[spark-docker] branch master updated: [SPARK-43148] Add Apache Spark 3.4.0 Dockerfiles

2023-04-17 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new fe05e38  [SPARK-43148] Add Apache Spark 3.4.0 Dockerfiles
fe05e38 is described below

commit fe05e38f0ffad271edccd6ae40a77d5f14f3eef7
Author: Yikun Jiang 
AuthorDate: Tue Apr 18 10:58:59 2023 +0800

[SPARK-43148] Add Apache Spark 3.4.0 Dockerfiles

### What changes were proposed in this pull request?
Add Apache Spark 3.4.0 Dockerfiles.
- Add 3.4.0 GPG key
- Add .github/workflows/build_3.4.0.yaml
- ./add-dockerfiles.sh 3.4.0

### Why are the changes needed?
Apache Spark 3.4.0 released:
https://spark.apache.org/releases/spark-release-3-4-0.html

### Does this PR introduce _any_ user-facing change?
Yes in future, new image will publised in future (after DOI reviewed)

### How was this patch tested?
Add workflow and CI passed

Closes #33 from Yikun/3.4.0.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/build_3.4.0.yaml |  43 
 .github/workflows/publish.yml  |   6 +-
 .github/workflows/test.yml |   6 +-
 3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile |  86 
 .../entrypoint.sh  | 114 +
 3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile   |  83 +++
 .../scala2.12-java11-python3-ubuntu/entrypoint.sh  | 114 +
 3.4.0/scala2.12-java11-r-ubuntu/Dockerfile |  82 +++
 3.4.0/scala2.12-java11-r-ubuntu/entrypoint.sh  | 107 +++
 3.4.0/scala2.12-java11-ubuntu/Dockerfile   |  79 ++
 3.4.0/scala2.12-java11-ubuntu/entrypoint.sh| 107 +++
 tools/template.py  |   2 +
 versions.json  |  42 ++--
 13 files changed, 860 insertions(+), 11 deletions(-)

diff --git a/.github/workflows/build_3.4.0.yaml 
b/.github/workflows/build_3.4.0.yaml
new file mode 100644
index 000..8dd4e1e
--- /dev/null
+++ b/.github/workflows/build_3.4.0.yaml
@@ -0,0 +1,43 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Build and Test (3.4.0)"
+
+on:
+  pull_request:
+branches:
+  - 'master'
+paths:
+  - '3.4.0/**'
+  - '.github/workflows/build_3.4.0.yaml'
+  - '.github/workflows/main.yml'
+
+jobs:
+  run-build:
+strategy:
+  matrix:
+image-type: ["all", "python", "scala", "r"]
+name: Run
+secrets: inherit
+uses: ./.github/workflows/main.yml
+with:
+  spark: 3.4.0
+  scala: 2.12
+  java: 11
+  image-type: ${{ matrix.image-type }}
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
index 2941cfb..70b88b8 100644
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -25,11 +25,13 @@ on:
   spark:
 description: 'The Spark version of Spark image.'
 required: true
-default: '3.3.0'
+default: '3.4.0'
 type: choice
 options:
-- 3.3.0
+- 3.4.0
+- 3.3.2
 - 3.3.1
+- 3.3.0
   publish:
 description: 'Publish the image or not.'
 default: false
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index efb401b..06e2321 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -25,11 +25,13 @@ on:
   spark:
 description: 'The Spark version of Spark image.'
 required: true
-default: '3.3.1'
+default: '3.4.0'
 type: choice
 options:
-- 3.3.0
+- 3.4.0
+- 3.3.2
 - 3.3.1
+- 3.3.0
   java:
 description: 'The Java version of Spark image.'
 default: 11
diff --git a/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
new file mode 100644
index 00

[spark-docker] branch master updated: [SPARK-42505] Apply entrypoint template change to 3.3.0/3.3.1

2023-02-21 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 02bc905  [SPARK-42505] Apply entrypoint template change to 3.3.0/3.3.1
02bc905 is described below

commit 02bc9054d757f8defbc2baf6af1d2a9aa84b2b35
Author: Yikun Jiang 
AuthorDate: Tue Feb 21 17:02:29 2023 +0800

[SPARK-42505] Apply entrypoint template change to 3.3.0/3.3.1

### What changes were proposed in this pull request?
Apply entrypoint template change to 3.3.0/3.3.1

### Why are the changes needed?
We remove the redundant PySpark related vars in 
https://github.com/apache/spark-docker/commit/e8f5b0a1151c349d9c7fdb09cf76300b42a6946b
 . This change also should be apply to 3.3.0/3.3.1.

### Does this PR introduce _any_ user-facing change?
No, because the image hasn't plublished yet.

### How was this patch tested?
CI for 3.3.0/3.3.1 passed

Closes #31 from Yikun/SPARK-42505.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh | 7 ---
 3.3.0/scala2.12-java11-ubuntu/entrypoint.sh   | 7 ---
 3.3.1/scala2.12-java11-r-ubuntu/entrypoint.sh | 7 ---
 3.3.1/scala2.12-java11-ubuntu/entrypoint.sh   | 7 ---
 4 files changed, 28 deletions(-)

diff --git a/3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh 
b/3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh
index 4bb1557..159d539 100644
--- a/3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh
+++ b/3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh
@@ -45,13 +45,6 @@ if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
   SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
 fi
 
-if ! [ -z ${PYSPARK_PYTHON+x} ]; then
-export PYSPARK_PYTHON
-fi
-if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then
-export PYSPARK_DRIVER_PYTHON
-fi
-
 # If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so 
Hadoop jars are available to the executor.
 # It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding 
customizations of this value from elsewhere e.g. Docker/K8s.
 if [ -n "${HADOOP_HOME}"  ] && [ -z "${SPARK_DIST_CLASSPATH}"  ]; then
diff --git a/3.3.0/scala2.12-java11-ubuntu/entrypoint.sh 
b/3.3.0/scala2.12-java11-ubuntu/entrypoint.sh
index 4bb1557..159d539 100644
--- a/3.3.0/scala2.12-java11-ubuntu/entrypoint.sh
+++ b/3.3.0/scala2.12-java11-ubuntu/entrypoint.sh
@@ -45,13 +45,6 @@ if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
   SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
 fi
 
-if ! [ -z ${PYSPARK_PYTHON+x} ]; then
-export PYSPARK_PYTHON
-fi
-if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then
-export PYSPARK_DRIVER_PYTHON
-fi
-
 # If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so 
Hadoop jars are available to the executor.
 # It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding 
customizations of this value from elsewhere e.g. Docker/K8s.
 if [ -n "${HADOOP_HOME}"  ] && [ -z "${SPARK_DIST_CLASSPATH}"  ]; then
diff --git a/3.3.1/scala2.12-java11-r-ubuntu/entrypoint.sh 
b/3.3.1/scala2.12-java11-r-ubuntu/entrypoint.sh
index 4bb1557..159d539 100644
--- a/3.3.1/scala2.12-java11-r-ubuntu/entrypoint.sh
+++ b/3.3.1/scala2.12-java11-r-ubuntu/entrypoint.sh
@@ -45,13 +45,6 @@ if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
   SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
 fi
 
-if ! [ -z ${PYSPARK_PYTHON+x} ]; then
-export PYSPARK_PYTHON
-fi
-if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then
-export PYSPARK_DRIVER_PYTHON
-fi
-
 # If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so 
Hadoop jars are available to the executor.
 # It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding 
customizations of this value from elsewhere e.g. Docker/K8s.
 if [ -n "${HADOOP_HOME}"  ] && [ -z "${SPARK_DIST_CLASSPATH}"  ]; then
diff --git a/3.3.1/scala2.12-java11-ubuntu/entrypoint.sh 
b/3.3.1/scala2.12-java11-ubuntu/entrypoint.sh
index 4bb1557..159d539 100644
--- a/3.3.1/scala2.12-java11-ubuntu/entrypoint.sh
+++ b/3.3.1/scala2.12-java11-ubuntu/entrypoint.sh
@@ -45,13 +45,6 @@ if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
   SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
 fi
 
-if ! [ -z ${PYSPARK_PYTHON+x} ]; then
-export PYSPARK_PYTHON
-fi
-if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then
-export PYSPARK_DRIVER_PYTHON
-fi
-
 # If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so 
Hadoop jars are available to the executor.
 # It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding 
customizations of this value from elsewhere e.g. Docker/K8s.
 if [ -n "${HADOOP_HOME}"  ] &

[spark-docker] branch master updated: [SPARK-42494] Add official image Dockerfile for Spark v3.3.2

2023-02-20 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new e8f5b0a  [SPARK-42494] Add official image Dockerfile for Spark v3.3.2
e8f5b0a is described below

commit e8f5b0a1151c349d9c7fdb09cf76300b42a6946b
Author: Yikun Jiang 
AuthorDate: Tue Feb 21 14:22:19 2023 +0800

[SPARK-42494] Add official image Dockerfile for Spark v3.3.2

### What changes were proposed in this pull request?
Add Apache Spark 3.3.2 Dockerfiles.
- Add 3.3.2 GPG key
- Add .github/workflows/build_3.3.2.yaml
- ./add-dockerfiles.sh 3.3.2

### Why are the changes needed?
Apache Spark 3.3.2 released.

https://lists.apache.org/thread/k8skf16wyn6rg9n0vd0t6l3bhw7c9svq

### Does this PR introduce _any_ user-facing change?
Yes in future, new image will publised in future (after DOI reviewed)

### How was this patch tested?
Add workflow and CI passed

Closes #30 from Yikun/SPARK-42494.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/build_3.3.2.yaml | 43 +++
 3.3.2/scala2.12-java11-python3-r-ubuntu/Dockerfile | 86 ++
 .../entrypoint.sh  |  0
 3.3.2/scala2.12-java11-python3-ubuntu/Dockerfile   | 83 +
 .../scala2.12-java11-python3-ubuntu/entrypoint.sh  |  0
 3.3.2/scala2.12-java11-r-ubuntu/Dockerfile | 82 +
 .../scala2.12-java11-r-ubuntu/entrypoint.sh|  7 --
 3.3.2/scala2.12-java11-ubuntu/Dockerfile   | 79 
 .../scala2.12-java11-ubuntu/entrypoint.sh  |  7 --
 add-dockerfiles.sh |  2 +-
 entrypoint.sh.template |  2 +
 tools/template.py  |  2 +
 12 files changed, 378 insertions(+), 15 deletions(-)

diff --git a/.github/workflows/build_3.3.2.yaml 
b/.github/workflows/build_3.3.2.yaml
new file mode 100644
index 000..9ae1a13
--- /dev/null
+++ b/.github/workflows/build_3.3.2.yaml
@@ -0,0 +1,43 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Build and Test (3.3.2)"
+
+on:
+  pull_request:
+branches:
+  - 'master'
+paths:
+  - '3.3.2/**'
+  - '.github/workflows/build_3.3.2.yaml'
+  - '.github/workflows/main.yml'
+
+jobs:
+  run-build:
+strategy:
+  matrix:
+image-type: ["all", "python", "scala", "r"]
+name: Run
+secrets: inherit
+uses: ./.github/workflows/main.yml
+with:
+  spark: 3.3.2
+  scala: 2.12
+  java: 11
+  image-type: ${{ matrix.image-type }}
diff --git a/3.3.2/scala2.12-java11-python3-r-ubuntu/Dockerfile 
b/3.3.2/scala2.12-java11-python3-r-ubuntu/Dockerfile
new file mode 100644
index 000..b518021
--- /dev/null
+++ b/3.3.2/scala2.12-java11-python3-r-ubuntu/Dockerfile
@@ -0,0 +1,86 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+FROM eclipse-temurin:11-jre-focal
+
+ARG spark_uid=185
+
+RUN groupadd --system --gid=${spark_uid} spark && \
+useradd --system --uid=${spark_uid} --gid=spark spark
+
+RUN set -ex && \
+apt-get update && \
+ln -s /lib /lib64 && \
+apt insta

[spark] branch master updated: [SPARK-42214][INFRA] Enable infra image build for scheduled job

2023-01-28 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f348d4fb9ff [SPARK-42214][INFRA] Enable infra image build for 
scheduled job
f348d4fb9ff is described below

commit f348d4fb9ffabc490b7c5294cd15eed2a74f2b60
Author: Yikun Jiang 
AuthorDate: Sat Jan 28 18:01:57 2023 +0800

[SPARK-42214][INFRA] Enable infra image build for scheduled job

### What changes were proposed in this pull request?
Enable infra image build for scheduled job.

The branch scheduled job is based on master branch workflow, so we need to 
enable the infra image for master branch / branch (3.4+). (except 3.2/3.3)

### Why are the changes needed?
Enable infra image build for scheduled job.

### Does this PR introduce _any_ user-facing change?
No, infra only

### How was this patch tested?
- CI passed (to make sure master branch job passed)
- Manually review and check the scheduled job after merge:
https://github.com/apache/spark/actions/workflows/build_branch34.yml

Closes #39778 from Yikun/SPARK-42214.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/build_and_test.yml | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 54b3d1d19d4..021566a5b8e 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -58,8 +58,8 @@ jobs:
   required: ${{ steps.set-outputs.outputs.required }}
   image_url: >-
 ${{
-  (inputs.branch == 'master' && 
steps.infra-image-outputs.outputs.image_url)
-  || 'dongjoon/apache-spark-github-action-image:20220207'
+  ((inputs.branch == 'branch-3.2' || inputs.branch == 'branch-3.3') && 
'dongjoon/apache-spark-github-action-image:20220207')
+  || steps.infra-image-outputs.outputs.image_url
 }}
 steps:
 - name: Checkout Spark repository
@@ -268,12 +268,12 @@ jobs:
   infra-image:
 name: "Base image build"
 needs: precondition
-# Currently, only enable docker build from cache for `master` branch jobs
+# Currently, enable docker build from cache for `master` and branch (since 
3.4) jobs
 if: >-
   (fromJson(needs.precondition.outputs.required).pyspark == 'true' ||
   fromJson(needs.precondition.outputs.required).lint == 'true' ||
   fromJson(needs.precondition.outputs.required).sparkr == 'true') &&
-  inputs.branch == 'master'
+  (inputs.branch != 'branch-3.2' && inputs.branch != 'branch-3.3')
 runs-on: ubuntu-latest
 permissions:
   packages: write


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-40520] Add support to generate DOI mainifest

2022-12-20 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 7bb8661  [SPARK-40520] Add support to generate DOI mainifest
7bb8661 is described below

commit 7bb8661f7d57356f94fd5874696df1b1c058cb0b
Author: Yikun Jiang 
AuthorDate: Wed Dec 21 10:15:44 2022 +0800

[SPARK-40520] Add support to generate DOI mainifest

### What changes were proposed in this pull request?
This patch add support to generate DOI mainifest from versions.json.

### Why are the changes needed?
To help generate DOI mainifest

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
```shell
$ flake8 ./tools/manifest.py --max-line-length=100
$ black ./tools/manifest.py
All done! ✨  ✨
1 file left unchanged.
```

```shell
$ tools/manifest.py manifest
Maintainers: Apache Spark Developers  (ApacheSpark)
GitRepo: https://github.com/apache/spark-docker.git

Tags: 3.3.1-scala2.12-java11-python3-ubuntu, 3.3.1-python3, 3.3.1, python3, 
latest
Architectures: amd64, arm64v8
GitCommit: 496edb6dee0ade08bc5d180d7a6da0ff8b5d91ff
Directory: ./3.3.1/scala2.12-java11-python3-ubuntu

Tags: 3.3.1-scala2.12-java11-r-ubuntu, 3.3.1-r, r
Architectures: amd64, arm64v8
GitCommit: 496edb6dee0ade08bc5d180d7a6da0ff8b5d91ff
Directory: ./3.3.1/scala2.12-java11-r-ubuntu

// ... ...
```

Closes #27 from Yikun/SPARK-40520.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 tools/manifest.py | 34 --
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/tools/manifest.py b/tools/manifest.py
index fbfad6f..13bc631 100755
--- a/tools/manifest.py
+++ b/tools/manifest.py
@@ -19,7 +19,33 @@
 
 from argparse import ArgumentParser
 import json
-from statistics import mode
+import subprocess
+
+
+def run_cmd(cmd):
+if isinstance(cmd, list):
+return subprocess.check_output(cmd).decode("utf-8")
+else:
+return subprocess.check_output(cmd.split(" ")).decode("utf-8")
+
+
+def generate_manifest(versions):
+output = (
+"Maintainers: Apache Spark Developers  
(@ApacheSpark)\n"
+"GitRepo: https://github.com/apache/spark-docker.git\n\n;
+)
+git_commit = run_cmd("git rev-parse HEAD").replace("\n", "")
+content = (
+"Tags: %s\n"
+"Architectures: amd64, arm64v8\n"
+"GitCommit: %s\n"
+"Directory: ./%s\n\n"
+)
+for version in versions:
+tags = ", ".join(version["tags"])
+path = version["path"]
+output += content % (tags, git_commit, path)
+return output
 
 
 def parse_opts():
@@ -27,7 +53,7 @@ def parse_opts():
 
 parser.add_argument(
 dest="mode",
-choices=["tags"],
+choices=["tags", "manifest"],
 type=str,
 help="The print mode of script",
 )
@@ -76,6 +102,10 @@ def main():
 # Get matched version's tags
 tags = versions[0]["tags"] if versions else []
 print(",".join(["%s:%s" % (image, t) for t in tags]))
+elif mode == "manifest":
+with open(version_file, "r") as f:
+versions = json.load(f).get("versions")
+print(generate_manifest(versions))
 
 
 if __name__ == "__main__":


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: [SPARK-40270][PS][FOLLOWUP][3.2] Skip test_style when pandas <1.3.0

2022-12-09 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 43402fdeb09 [SPARK-40270][PS][FOLLOWUP][3.2] Skip test_style when 
pandas <1.3.0
43402fdeb09 is described below

commit 43402fdeb0942e518ec7f5561ddf3690ae5cac27
Author: Yikun Jiang 
AuthorDate: Fri Dec 9 22:15:48 2022 +0800

[SPARK-40270][PS][FOLLOWUP][3.2] Skip test_style when pandas <1.3.0

### What changes were proposed in this pull request?
According to 
https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.to_latex.html:
`pandas.io.formats.style.Styler.to_latex` introduced since 1.3.0, so before 
panda 1.3.0, should skip the check

```
ERROR [0.180s]: test_style 
(pyspark.pandas.tests.test_dataframe.DataFrameTest)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_dataframe.py", 
line 5795, in test_style
check_style()
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_dataframe.py", 
line 5793, in check_style
self.assert_eq(pdf_style.to_latex(), psdf_style.to_latex())
AttributeError: 'Styler' object has no attribute 'to_latex'
```

Related: 
https://github.com/apache/spark/commit/58375a86e6ff49c5bcee49939fbd98eb848ae59f

### Why are the changes needed?
This test break the 3.2 branch pyspark test (with python 3.6 + pandas 
1.1.x), so I think better add the `skipIf` it.

See also https://github.com/apache/spark/pull/38982#issuecomment-1343923114

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed
    
    Closes #39008 from Yikun/branch-3.2-style-check.
    
Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 python/pyspark/pandas/tests/test_dataframe.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/python/pyspark/pandas/tests/test_dataframe.py 
b/python/pyspark/pandas/tests/test_dataframe.py
index b4187d59ae7..15cadbebdb6 100644
--- a/python/pyspark/pandas/tests/test_dataframe.py
+++ b/python/pyspark/pandas/tests/test_dataframe.py
@@ -5774,6 +5774,10 @@ class DataFrameTest(PandasOnSparkTestCase, SQLTestUtils):
 for value_psdf, value_pdf in zip(psdf, pdf):
 self.assert_eq(value_psdf, value_pdf)
 
+@unittest.skipIf(
+LooseVersion(pd.__version__) < LooseVersion("1.3.0"),
+"pandas support `Styler.to_latex` since 1.3.0",
+)
 def test_style(self):
 # Currently, the `style` function returns a pandas object `Styler` as 
it is,
 # processing only the number of rows declared in `compute.max_rows`.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.3 updated: [SPARK-40270][PS][FOLLOWUP][3.3] Skip test_style when pandas <1.3.0

2022-12-09 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new b6c6526e3b1 [SPARK-40270][PS][FOLLOWUP][3.3] Skip test_style when 
pandas <1.3.0
b6c6526e3b1 is described below

commit b6c6526e3b1c5bd32b010a38cb0f4faeba678e22
Author: Yikun Jiang 
AuthorDate: Fri Dec 9 22:13:09 2022 +0800

[SPARK-40270][PS][FOLLOWUP][3.3] Skip test_style when pandas <1.3.0

### What changes were proposed in this pull request?
According to 
https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.to_latex.html:
`pandas.io.formats.style.Styler.to_latex` introduced since 1.3.0, so before 
panda 1.3.0, should skip the check

```
ERROR [0.180s]: test_style 
(pyspark.pandas.tests.test_dataframe.DataFrameTest)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_dataframe.py", 
line 5795, in test_style
check_style()
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_dataframe.py", 
line 5793, in check_style
self.assert_eq(pdf_style.to_latex(), psdf_style.to_latex())
AttributeError: 'Styler' object has no attribute 'to_latex'
```

Related: 
https://github.com/apache/spark/commit/58375a86e6ff49c5bcee49939fbd98eb848ae59f

### Why are the changes needed?
This test break the 3.2 branch pyspark test (with python 3.6 + pandas 
1.1.x), so I think better add the `skipIf` it.

See also https://github.com/apache/spark/pull/38982#issuecomment-1343923114

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- CI passed
    
    Closes #39007 from Yikun/branch-3.3-check.
    
Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 python/pyspark/pandas/tests/test_dataframe.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/python/pyspark/pandas/tests/test_dataframe.py 
b/python/pyspark/pandas/tests/test_dataframe.py
index 0a7eda77564..0c23bf07a69 100644
--- a/python/pyspark/pandas/tests/test_dataframe.py
+++ b/python/pyspark/pandas/tests/test_dataframe.py
@@ -6375,6 +6375,10 @@ class DataFrameTest(ComparisonTestBase, SQLTestUtils):
 psdf = ps.from_pandas(pdf)
 self.assert_eq(pdf.cov(), psdf.cov())
 
+@unittest.skipIf(
+LooseVersion(pd.__version__) < LooseVersion("1.3.0"),
+"pandas support `Styler.to_latex` since 1.3.0",
+)
 def test_style(self):
 # Currently, the `style` function returns a pandas object `Styler` as 
it is,
 # processing only the number of rows declared in `compute.max_rows`.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-40270][PS][FOLLOWUP] Skip test_style when pandas <1.3.0

2022-12-09 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dd0bd0762b3 [SPARK-40270][PS][FOLLOWUP] Skip test_style when pandas 
<1.3.0
dd0bd0762b3 is described below

commit dd0bd0762b344ab34e1b08c9bbd2ac77b83856e0
Author: Yikun Jiang 
AuthorDate: Fri Dec 9 22:11:03 2022 +0800

[SPARK-40270][PS][FOLLOWUP] Skip test_style when pandas <1.3.0

### What changes were proposed in this pull request?
According to 
https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.to_latex.html:
`pandas.io.formats.style.Styler.to_latex` introduced since 1.3.0, so before 
panda 1.3.0, should skip the check

```
ERROR [0.180s]: test_style 
(pyspark.pandas.tests.test_dataframe.DataFrameTest)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_dataframe.py", 
line 5795, in test_style
check_style()
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_dataframe.py", 
line 5793, in check_style
self.assert_eq(pdf_style.to_latex(), psdf_style.to_latex())
AttributeError: 'Styler' object has no attribute 'to_latex'
```

Related: 
https://github.com/apache/spark/commit/58375a86e6ff49c5bcee49939fbd98eb848ae59f

### Why are the changes needed?
This test break the 3.2 branch pyspark test (with python 3.6 + pandas 
1.1.x), so I think better add the `skipIf` it.

See also https://github.com/apache/spark/pull/38982#issuecomment-1343923114

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- CI passed
- Test on 3.2 branch: https://github.com/Yikun/spark/pull/194, 
https://github.com/Yikun/spark/actions/runs/3655564439/jobs/6177030747
    
    Closes #39002 from Yikun/skip-check.
    
Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 python/pyspark/pandas/tests/test_dataframe.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/python/pyspark/pandas/tests/test_dataframe.py 
b/python/pyspark/pandas/tests/test_dataframe.py
index 4e80c680b6e..ded110c1231 100644
--- a/python/pyspark/pandas/tests/test_dataframe.py
+++ b/python/pyspark/pandas/tests/test_dataframe.py
@@ -7074,6 +7074,10 @@ class DataFrameTest(ComparisonTestBase, SQLTestUtils):
 psdf = ps.from_pandas(pdf)
 self.assert_eq(pdf.cov(), psdf.cov())
 
+@unittest.skipIf(
+LooseVersion(pd.__version__) < LooseVersion("1.3.0"),
+"pandas support `Styler.to_latex` since 1.3.0",
+)
 def test_style(self):
 # Currently, the `style` function returns a pandas object `Styler` as 
it is,
 # processing only the number of rows declared in `compute.max_rows`.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Update test_dataframe.py

2022-12-09 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch branch-3.2-style-check
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 49d31b0d860da90cf2f4ec696b3220f24355f65e
Author: Yikun Jiang 
AuthorDate: Fri Dec 9 19:46:01 2022 +0800

Update test_dataframe.py
---
 python/pyspark/pandas/tests/test_dataframe.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/python/pyspark/pandas/tests/test_dataframe.py 
b/python/pyspark/pandas/tests/test_dataframe.py
index b4187d59ae7..15cadbebdb6 100644
--- a/python/pyspark/pandas/tests/test_dataframe.py
+++ b/python/pyspark/pandas/tests/test_dataframe.py
@@ -5774,6 +5774,10 @@ class DataFrameTest(PandasOnSparkTestCase, SQLTestUtils):
 for value_psdf, value_pdf in zip(psdf, pdf):
 self.assert_eq(value_psdf, value_pdf)
 
+@unittest.skipIf(
+LooseVersion(pd.__version__) < LooseVersion("1.3.0"),
+"pandas support `Styler.to_latex` since 1.3.0",
+)
 def test_style(self):
 # Currently, the `style` function returns a pandas object `Styler` as 
it is,
 # processing only the number of rows declared in `compute.max_rows`.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2-style-check created (now 49d31b0d860)

2022-12-09 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a change to branch branch-3.2-style-check
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 49d31b0d860 Update test_dataframe.py

This branch includes the following new commits:

 new 49d31b0d860 Update test_dataframe.py

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.3 updated: [SPARK-41253][K8S][TESTS] Make Spark K8S volcano IT work in Github Action

2022-12-03 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 821997bec37 [SPARK-41253][K8S][TESTS] Make Spark K8S volcano IT work 
in Github Action
821997bec37 is described below

commit 821997bec3703ec52db9b1deb667e11e76296c48
Author: Yikun Jiang 
AuthorDate: Fri Dec 2 22:44:50 2022 -0800

[SPARK-41253][K8S][TESTS] Make Spark K8S volcano IT work in Github Action

### What changes were proposed in this pull request?
This patch makes Spark K8s volcano IT can be ran in Github Action resource 
limited env. It will help downstream community like volcano to enable spark IT 
test in github action.

BTW, there is no plan to enable volcano test in Spark community, this patch 
only make test work but **DO NOT** enable the volcano test in Apache Spark GA, 
it will help downstream test.

- Change parallel job number from 4 to 2. (Only 1 job in each queue) if in 
github action env.
- Get specified `spark.kubernetes.[driver|executor].request.cores`
- Set queue limit according specified [driver|executor].request.cores just 
like we done in normal test: 
https://github.com/apache/spark/commit/883a481e44a1f91ef3fc3aea2838a598cbd6cf0f

### Why are the changes needed?

It helps downstream communitys who want to use free github action hosted 
resources to enable spark IT test in github action.

### Does this PR introduce _any_ user-facing change?
No, test only.

### How was this patch tested?
- Test on my local env with enough resource (default):
```
$  build/sbt -Pvolcano -Pkubernetes -Pkubernetes-integration-tests 
-Dtest.include.tags=volcano "kubernetes-integration-tests/test"

[info] KubernetesSuite:
[info] VolcanoSuite:
[info] - Run SparkPi with volcano scheduler (10 seconds, 410 milliseconds)
[info] - SPARK-38187: Run SparkPi Jobs with minCPU (25 seconds, 489 
milliseconds)
[info] - SPARK-38187: Run SparkPi Jobs with minMemory (25 seconds, 518 
milliseconds)
[info] - SPARK-38188: Run SparkPi jobs with 2 queues (only 1 enabled) (14 
seconds, 349 milliseconds)
[info] - SPARK-38188: Run SparkPi jobs with 2 queues (all enabled) (23 
seconds, 516 milliseconds)
[info] - SPARK-38423: Run driver job to validate priority order (16 
seconds, 404 milliseconds)
[info] YuniKornSuite:
[info] Run completed in 2 minutes, 34 seconds.
[info] Total number of tests run: 6
[info] Suites: completed 3, aborted 0
[info] Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 439 s (07:19), completed 2022-12-3 8:58:50
```

- Test on Github Action with `volcanoMaxConcurrencyJobNum`: 
https://github.com/Yikun/spark/pull/192
```
$ build/sbt -Pvolcano -Psparkr -Pkubernetes -Pkubernetes-integration-tests 
-Dspark.kubernetes.test.driverRequestCores=0.5 
-Dspark.kubernetes.test.executorRequestCores=0.2 
-Dspark.kubernetes.test.volcanoMaxConcurrencyJobNum=1 
-Dtest.include.tags=volcano "kubernetes-integration-tests/test"

[info] VolcanoSuite:
[info] - Run SparkPi with volcano scheduler (18 seconds, 122 milliseconds)
[info] - SPARK-38187: Run SparkPi Jobs with minCPU (53 seconds, 964 
milliseconds)
[info] - SPARK-38187: Run SparkPi Jobs with minMemory (54 seconds, 523 
milliseconds)
[info] - SPARK-38188: Run SparkPi jobs with 2 queues (only 1 enabled) (22 
seconds, 185 milliseconds)
[info] - SPARK-38188: Run SparkPi jobs with 2 queues (all enabled) (33 
seconds, 349 milliseconds)
[info] - SPARK-38423: Run driver job to validate priority order (32 
seconds, 435 milliseconds)
[info] YuniKornSuite:
[info] Run completed in 4 minutes, 16 seconds.
[info] Total number of tests run: 6
[info] Suites: completed 3, aborted 0
[info] Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[warn] In the last 494 seconds, 7.296 (1.5%) were spent in GC. [Heap: 
3.12GB free of 3.83GB, max 3.83GB] Consider increasing the JVM heap using 
`-Xmx` or try a different collector, e.g. `-XX:+UseG1GC`, for better 
performance.
[success] Total time: 924 s (15:24), completed Dec 3, 2022 12:49:42 AM
```

- CI passed

    Closes #38789 from Yikun/SPARK-41253.
    
    Authored-by: Yikun Jiang 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 72d58d5f8a847bac53cf01b137780c7e4e2664d7)
Signed-off-by: Yikun Jiang 
---
 .../kubernetes/integration-tests/README.md |  8 
 .../volcano/driver-podgroup-template-cpu-2u.yml| 23 --
 .../deploy/k8s/integrationtest/TestConstants.scala |  2 +
 .../k8s/integrationtest/VolcanoTestsSuite.scala| 52 +-
 4 files changed, 51 insertions(+), 3

[spark] branch branch-3.3 updated: [SPARK-38921][K8S][TESTS] Use k8s-client to create queue resource in Volcano IT

2022-12-03 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 20cc2b6104e [SPARK-38921][K8S][TESTS] Use k8s-client to create queue 
resource in Volcano IT
20cc2b6104e is described below

commit 20cc2b6104e1670be3295ed52be54bb40de1b1ce
Author: Yikun Jiang 
AuthorDate: Thu Aug 11 08:28:57 2022 -0700

[SPARK-38921][K8S][TESTS] Use k8s-client to create queue resource in 
Volcano IT

### What changes were proposed in this pull request?
Use fabric8io/k8s-client to create queue resource in Volcano IT.

### Why are the changes needed?
Use k8s-client to create volcano queue to
- Make code easy to understand
- Enable abity to set queue capacity dynamically. This will help to support 
running Volcano test in a resource limited env (such as github action).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Volcano IT passed

Closes #36219 from Yikun/SPARK-38921.

Authored-by: Yikun Jiang 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit a49f66fe49d4d4bbfb41da2e5bbb5af4bd64d1da)
Signed-off-by: Yikun Jiang 
---
 .../src/test/resources/volcano/disable-queue.yml   | 24 ---
 .../volcano/disable-queue0-enable-queue1.yml   | 31 -
 .../volcano/driver-podgroup-template-cpu-2u.yml|  2 +-
 .../volcano/driver-podgroup-template-memory-3g.yml |  2 +-
 .../src/test/resources/volcano/enable-queue.yml| 24 ---
 .../volcano/enable-queue0-enable-queue1.yml| 29 -
 .../src/test/resources/volcano/queue-2u-3g.yml | 25 
 .../k8s/integrationtest/VolcanoTestsSuite.scala| 74 +++---
 8 files changed, 52 insertions(+), 159 deletions(-)

diff --git 
a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue.yml
 
b/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue.yml
deleted file mode 100644
index d9f8c36471e..000
--- 
a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue.yml
+++ /dev/null
@@ -1,24 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-apiVersion: scheduling.volcano.sh/v1beta1
-kind: Queue
-metadata:
-  name: queue
-spec:
-  weight: 1
-  capability:
-cpu: "0.001"
diff --git 
a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue0-enable-queue1.yml
 
b/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue0-enable-queue1.yml
deleted file mode 100644
index 82e479478cc..000
--- 
a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue0-enable-queue1.yml
+++ /dev/null
@@ -1,31 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-apiVersion: scheduling.volcano.sh/v1beta1
-kind: Queue
-metadata:
-  name: queue0
-spec:
-  weight: 1
-  capability:
-cpu: "0.001"

-apiVersion: scheduling.volcano.sh/v1beta1
-kind: Queue
-metadata:
-  name: queue1
-spec:
-  weight: 1
diff --git 
a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/driver-podgroup-template-cpu-2u.yml
 
b/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/driver

[spark-docker] branch master updated: [SPARK-41287][INFRA] Add test workflow to help self-build image test in fork repo

2022-11-28 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new cfcbeac  [SPARK-41287][INFRA] Add test workflow to help self-build 
image test in fork repo
cfcbeac is described below

commit cfcbeac5d2b922a5ee7dfd2b4a5cf08072c827b7
Author: Yikun Jiang 
AuthorDate: Mon Nov 28 17:55:18 2022 +0800

[SPARK-41287][INFRA] Add test workflow to help self-build image test in 
fork repo

### What changes were proposed in this pull request?
This patch adds a test workflow to help fork repo to test image in their 
fork repos.


![image](https://user-images.githubusercontent.com/1736354/204183109-e2341397-251e-42a0-b5f7-c1c1f9334ff9.png)

such like:
- 
https://github.com/Yikun/spark-docker/actions/runs/3552072792/jobs/5966742869
- 
https://github.com/Yikun/spark-docker/actions/runs/3561513498/jobs/5982485960

### Why are the changes needed?
Help devs/users test their own image in their fork repo

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Test in my fork repo:
https://github.com/Yikun/spark-docker/actions/workflows/test.yml

Closes #26 from Yikun/test-workflow.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/main.yml  | 28 +++--
 .github/workflows/publish.yml   |  2 +-
 .github/workflows/{publish.yml => test.yml} | 62 -
 3 files changed, 60 insertions(+), 32 deletions(-)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index ebafcdc..fd37990 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -37,13 +37,18 @@ on:
 required: true
 type: string
 default: 11
+  build:
+description: Build the image or not.
+required: false
+type: boolean
+default: true
   publish:
 description: Publish the image or not.
 required: false
 type: boolean
 default: false
   repository:
-description: The registry to be published (Avaliable only when publish 
is selected).
+description: The registry to be published/tested. (Available only in 
publish/test workflow)
 required: false
 type: string
 default: ghcr.io/apache/spark-docker
@@ -52,6 +57,11 @@ on:
 required: false
 type: string
 default: python
+  image-tag:
+type: string
+description: The image tag to be tested. (Available only in test 
workflow)
+required: false
+default: latest
 
 jobs:
   main:
@@ -83,11 +93,18 @@ jobs:
   esac
   TAG=scala${{ inputs.scala }}-java${{ inputs.java }}-$SUFFIX
 
-  REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr '[:upper:]' 
'[:lower:]')
-  TEST_REPO=localhost:5000/$REPO_OWNER/spark-docker
   IMAGE_NAME=spark
   IMAGE_PATH=${{ inputs.spark }}/$TAG
-  UNIQUE_IMAGE_TAG=${{ inputs.spark }}-$TAG
+  if [ "${{ inputs.build }}" == "true" ]; then
+# Use the local registry to build and test
+REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr 
'[:upper:]' '[:lower:]')
+TEST_REPO=localhost:5000/$REPO_OWNER/spark-docker
+UNIQUE_IMAGE_TAG=${{ inputs.spark }}-$TAG
+  else
+# Use specified {repository}/spark:{image-tag} image to test
+TEST_REPO=${{ inputs.repository }}
+UNIQUE_IMAGE_TAG=${{ inputs.image-tag }}
+  fi
   IMAGE_URL=$TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG
 
   PUBLISH_REPO=${{ inputs.repository }}
@@ -119,15 +136,18 @@ jobs:
   echo "PUBLISH_IMAGE_URL:"${PUBLISH_IMAGE_URL}
 
   - name: Build - Set up QEMU
+if: ${{ inputs.build }}
 uses: docker/setup-qemu-action@v2
 
   - name: Build - Set up Docker Buildx
+if: ${{ inputs.build }}
 uses: docker/setup-buildx-action@v2
 with:
   # This required by local registry
   driver-opts: network=host
 
   - name: Build - Build and push test image
+if: ${{ inputs.build }}
 uses: docker/build-push-action@v3
 with:
   context: ${{ env.IMAGE_PATH }}
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
index 4a07f5d..2941cfb 100644
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -36,7 +36,7 @@ on:
 type: boolean
 required: true
   repository:
-description: The registry to be published (Avaliable only when publish 
is true).
+description: The registry to be published (Available only when publish 
is true).
 required

[spark-docker] branch master updated: [SPARK-41269][INFRA] Move image matrix into version's workflow

2022-11-27 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new d58e178  [SPARK-41269][INFRA] Move image matrix into version's workflow
d58e178 is described below

commit d58e17890f07b4c8c8d212775a53c48dc3a6ce42
Author: Yikun Jiang 
AuthorDate: Mon Nov 28 09:36:54 2022 +0800

[SPARK-41269][INFRA] Move image matrix into version's workflow

### What changes were proposed in this pull request?
This patch refactors main workflow:
- Move image matrix into version's workflow to make the main workflow more 
clear. And also will help downstream repo to only validate specified image type.
- Move build steps into a same section

### Why are the changes needed?
This will help downstream repo to only validate specified image type.

After this patch, we will add a test to reuse spark docker workflow like: 
https://github.com/yikun/spark-docker/commit/45044cee2e8919de7e7353e74f8ca612ad16629a
 to help developers/users test their self build image.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes #25 from Yikun/matrix-refactor.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/build_3.3.0.yaml |  4 ++
 .github/workflows/build_3.3.1.yaml |  4 ++
 .github/workflows/main.yml | 76 --
 .github/workflows/publish.yml  |  2 +
 4 files changed, 51 insertions(+), 35 deletions(-)

diff --git a/.github/workflows/build_3.3.0.yaml 
b/.github/workflows/build_3.3.0.yaml
index 7e7ce39..a4f8224 100644
--- a/.github/workflows/build_3.3.0.yaml
+++ b/.github/workflows/build_3.3.0.yaml
@@ -30,6 +30,9 @@ on:
 
 jobs:
   run-build:
+strategy:
+  matrix:
+image-type: ["all", "python", "scala", "r"]
 name: Run
 secrets: inherit
 uses: ./.github/workflows/main.yml
@@ -37,3 +40,4 @@ jobs:
   spark: 3.3.0
   scala: 2.12
   java: 11
+  image-type: ${{ matrix.image-type }}
diff --git a/.github/workflows/build_3.3.1.yaml 
b/.github/workflows/build_3.3.1.yaml
index f6a4b7d..9e5c082 100644
--- a/.github/workflows/build_3.3.1.yaml
+++ b/.github/workflows/build_3.3.1.yaml
@@ -30,6 +30,9 @@ on:
 
 jobs:
   run-build:
+strategy:
+  matrix:
+image-type: ["all", "python", "scala", "r"]
 name: Run
 secrets: inherit
 uses: ./.github/workflows/main.yml
@@ -37,3 +40,4 @@ jobs:
   spark: 3.3.1
   scala: 2.12
   java: 11
+  image-type: ${{ matrix.image-type }}
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 024b853..ebafcdc 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -47,6 +47,11 @@ on:
 required: false
 type: string
 default: ghcr.io/apache/spark-docker
+  image-type:
+description: The image type of the image (all, python, scala, r).
+required: false
+type: string
+default: python
 
 jobs:
   main:
@@ -60,41 +65,33 @@ jobs:
 image: registry:2
 ports:
   - 5000:5000
-strategy:
-  matrix:
-spark_version:
-  - ${{ inputs.spark }}
-scala_version:
-  - ${{ inputs.scala }}
-java_version:
-  - ${{ inputs.java }}
-image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu]
 steps:
   - name: Checkout Spark Docker repository
 uses: actions/checkout@v3
 
-  - name: Set up QEMU
-uses: docker/setup-qemu-action@v2
-
-  - name: Set up Docker Buildx
-uses: docker/setup-buildx-action@v2
-with:
-  # This required by local registry
-  driver-opts: network=host
-
-  - name: Generate tags
+  - name: Prepare - Generate tags
 run: |
-  TAG=scala${{ matrix.scala_version }}-java${{ matrix.java_version 
}}-${{ matrix.image_suffix }}
+  case "${{ inputs.image-type }}" in
+  all) SUFFIX=python3-r-ubuntu
+  ;;
+  python) SUFFIX=python3-ubuntu
+  ;;
+  r) SUFFIX=r-ubuntu
+  ;;
+  scala) SUFFIX=ubuntu
+  ;;
+  esac
+  TAG=scala${{ inputs.scala }}-java${{ inputs.java }}-$SUFFIX
 
   REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr '[:upper:]' 
'[:lower:]')
   TEST_REPO=localhost:5000/$REPO_OWNER/spark-docker
   IMAGE_NAME=spark
-  IMAGE_PATH=${{ matrix.spark_version }}/$TAG
-  UNIQUE_IMAGE_TAG=${{ matrix.spark_version }}-$TAG
+  IMAGE_PATH=${{ inputs.spark }}/$TAG
+  UNIQUE_IMAGE_TAG=${{ inputs.s

[spark-docker] branch master updated: [SPARK-41258][INFRA] Upgrade docker and actions to cleanup warnning

2022-11-24 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 33abc18  [SPARK-41258][INFRA] Upgrade docker and actions to cleanup 
warnning
33abc18 is described below

commit 33abc1894f3de135e827ce393842ca355229c117
Author: Yikun Jiang 
AuthorDate: Fri Nov 25 14:57:27 2022 +0800

[SPARK-41258][INFRA] Upgrade docker and actions to cleanup warnning

### What changes were proposed in this pull request?
- Upgrade `actions/checkout` from v2 to v3
- Upgrade `docker/build-push-action` from v2 to v3

### Why are the changes needed?
Cleanup set output and lower version node warnning

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Test passed

Closes #24 from Yikun/upgrade-actions.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/main.yml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index dfb99e9..024b853 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -71,7 +71,7 @@ jobs:
 image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu]
 steps:
   - name: Checkout Spark Docker repository
-uses: actions/checkout@v2
+uses: actions/checkout@v3
 
   - name: Set up QEMU
 uses: docker/setup-qemu-action@v2
@@ -122,7 +122,7 @@ jobs:
   echo "PUBLISH_IMAGE_URL:"${PUBLISH_IMAGE_URL}
 
   - name: Build and push test image
-uses: docker/build-push-action@v2
+uses: docker/build-push-action@v3
 with:
   context: ${{ env.IMAGE_PATH }}
   tags: ${{ env.IMAGE_URL }}
@@ -258,7 +258,7 @@ jobs:
 
   - name: Publish - Push Image
 if: ${{ inputs.publish }}
-uses: docker/build-push-action@v2
+uses: docker/build-push-action@v3
 with:
   context: ${{ env.IMAGE_PATH }}
   push: true


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (a205e97ad9a -> 575b8f00faf)

2022-11-24 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from a205e97ad9a [SPARK-41230][CONNECT][PYTHON] Remove `str` from Aggregate 
expression type
 add 575b8f00faf [SPARK-41257][INFRA] Upgrade actions/labeler to v4

No new revisions were added by this update.

Summary of changes:
 .github/workflows/labeler.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (033dbe604bc -> 71b5c5bde75)

2022-11-24 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 033dbe604bc [SPARK-41247][BUILD] Unify the Protobuf versions in Spark 
connect and Protobuf connector
 add 71b5c5bde75 [SPARK-41251][PS][INFRA] Upgrade pandas from 1.5.1 to 1.5.2

No new revisions were added by this update.

Summary of changes:
 dev/infra/Dockerfile   | 4 ++--
 python/pyspark/pandas/supported_api_gen.py | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (246479c8c5c -> 6e6e8560557)

2022-11-18 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 246479c8c5c [SPARK-41196][CONNECT][FOLLOW-UP] Fix out of sync 
generated files for Python
 add 6e6e8560557 [SPARK-41186][INFRA][PS][TESTS] Upgrade infra and replace 
`list_run_infos` with `search_runs` in mlflow doctest

No new revisions were added by this update.

Summary of changes:
 dev/infra/Dockerfile| 12 +---
 python/pyspark/pandas/mlflow.py |  4 ++--
 2 files changed, 7 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-41107][PYTHON][INFRA][TESTS] Install memory-profiler in the CI

2022-11-18 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 12a77bb22f1 [SPARK-41107][PYTHON][INFRA][TESTS] Install 
memory-profiler in the CI
12a77bb22f1 is described below

commit 12a77bb22f1689e361a5efe2d7000aead74ebc43
Author: Xinrong Meng 
AuthorDate: Fri Nov 18 17:12:39 2022 +0800

[SPARK-41107][PYTHON][INFRA][TESTS] Install memory-profiler in the CI

### What changes were proposed in this pull request?
Install [memory-profiler](https://pypi.org/project/memory-profiler/) in the 
CI in order to enable memory profiling tests.

### Why are the changes needed?
That's a sub-task of 
[SPARK-40281](https://issues.apache.org/jira/browse/SPARK-40281) Memory 
Profiler on Executors.

PySpark memory profiler depends on memory-profiler. The PR proposes to 
install memory-profiler in the CI to enable related tests.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

Closes #38611 from xinrong-meng/ci_mp.

Lead-authored-by: Xinrong Meng 
Co-authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 dev/infra/Dockerfile | 3 +++
 python/pyspark/tests/test_memory_profiler.py | 8 +---
 python/pyspark/tests/test_profiler.py| 2 ++
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 96b20894b87..a6331c2ead4 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -68,3 +68,6 @@ ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library
 
 # Add Python deps for Spark Connect.
 RUN python3.9 -m pip install grpcio protobuf
+
+# SPARK-41186: Move memory-profiler to pyspark deps install when mlfow doctest 
test fix
+RUN python3.9 -m pip install 'memory-profiler==0.60.0'
diff --git a/python/pyspark/tests/test_memory_profiler.py 
b/python/pyspark/tests/test_memory_profiler.py
index 7da82dccb37..3dc8ce4ce22 100644
--- a/python/pyspark/tests/test_memory_profiler.py
+++ b/python/pyspark/tests/test_memory_profiler.py
@@ -27,17 +27,11 @@ from unittest import mock
 import pandas as pd
 
 from pyspark import SparkConf, SparkContext
+from pyspark.profiler import has_memory_profiler
 from pyspark.sql import SparkSession
 from pyspark.sql.functions import pandas_udf, udf
 from pyspark.testing.utils import PySparkTestCase
 
-try:
-import memory_profiler  # type: ignore[import] # noqa: F401
-
-has_memory_profiler = True
-except Exception:
-has_memory_profiler = False
-
 
 @unittest.skipIf(not has_memory_profiler, "Must have memory-profiler 
installed.")
 class MemoryProfilerTests(PySparkTestCase):
diff --git a/python/pyspark/tests/test_profiler.py 
b/python/pyspark/tests/test_profiler.py
index ceae904ca6f..8a078d36b46 100644
--- a/python/pyspark/tests/test_profiler.py
+++ b/python/pyspark/tests/test_profiler.py
@@ -22,6 +22,7 @@ import unittest
 from io import StringIO
 
 from pyspark import SparkConf, SparkContext, BasicProfiler
+from pyspark.profiler import has_memory_profiler
 from pyspark.sql import SparkSession
 from pyspark.sql.functions import udf
 from pyspark.sql.utils import PythonException
@@ -126,6 +127,7 @@ class ProfilerTests2(unittest.TestCase):
 finally:
 sc.stop()
 
+@unittest.skipIf(has_memory_profiler, "Test when memory-profiler is not 
installed.")
 def test_no_memory_profile_installed(self):
 sc = SparkContext(
 conf=SparkConf()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-40519] Add "Publish" workflow to help release apache/spark image

2022-11-15 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new f488d73  [SPARK-40519] Add "Publish" workflow to help release 
apache/spark image
f488d73 is described below

commit f488d732d254caa78c1e1a2ef74958e6c867dad6
Author: Yikun Jiang 
AuthorDate: Tue Nov 15 21:32:30 2022 +0800

[SPARK-40519] Add "Publish" workflow to help release apache/spark image

### What changes were proposed in this pull request?
The publish step will include 3 steps:
1. First build the local image.
2. Pass related test (K8s test / Standalone test) using image of first step.
3. After pass all test, will publish to `ghcr` (This might help RC test) or 
`dockerhub`

It's about 30-40 mins to publish all images.

Add "Publish" workflow to help release apache/spark image.

![image](https://user-images.githubusercontent.com/1736354/201015477-30428444-0ed5-4436-8b59-7420c678c4a6.png)

### Why are the changes needed?
One click to create the `apche/spark` image.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
1. Set default branch in my fork repo
2. Run workflow manually, 
https://github.com/Yikun/spark-docker/actions/workflows/publish.yml?query=is%3Asuccess

Closes #23 from Yikun/workflow.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/main.yml| 43 +++
 .github/workflows/publish.yml | 66 ++
 tools/manifest.py | 82 +++
 versions.json | 64 +
 4 files changed, 255 insertions(+)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index accf8ae..dfb99e9 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -37,6 +37,16 @@ on:
 required: true
 type: string
 default: 11
+  publish:
+description: Publish the image or not.
+required: false
+type: boolean
+default: false
+  repository:
+description: The registry to be published (Avaliable only when publish 
is selected).
+required: false
+type: string
+default: ghcr.io/apache/spark-docker
 
 jobs:
   main:
@@ -83,6 +93,9 @@ jobs:
   UNIQUE_IMAGE_TAG=${{ matrix.spark_version }}-$TAG
   IMAGE_URL=$TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG
 
+  PUBLISH_REPO=${{ inputs.repository }}
+  PUBLISH_IMAGE_URL=`tools/manifest.py tags -i 
${PUBLISH_REPO}/${IMAGE_NAME} -p ${{ matrix.spark_version }}/${TAG}`
+
   # Unique image tag in each version: 
3.3.0-scala2.12-java11-python3-ubuntu
   echo "UNIQUE_IMAGE_TAG=${UNIQUE_IMAGE_TAG}" >> $GITHUB_ENV
   # Test repo: ghcr.io/apache/spark-docker
@@ -94,6 +107,9 @@ jobs:
   # Image URL: 
ghcr.io/apache/spark-docker/spark:3.3.0-scala2.12-java11-python3-ubuntu
   echo "IMAGE_URL=${IMAGE_URL}" >> $GITHUB_ENV
 
+  echo "PUBLISH_REPO=${PUBLISH_REPO}" >> $GITHUB_ENV
+  echo "PUBLISH_IMAGE_URL=${PUBLISH_IMAGE_URL}" >> $GITHUB_ENV
+
   - name: Print Image tags
 run: |
   echo "UNIQUE_IMAGE_TAG: "${UNIQUE_IMAGE_TAG}
@@ -102,6 +118,9 @@ jobs:
   echo "IMAGE_PATH: "${IMAGE_PATH}
   echo "IMAGE_URL: "${IMAGE_URL}
 
+  echo "PUBLISH_REPO:"${PUBLISH_REPO}
+  echo "PUBLISH_IMAGE_URL:"${PUBLISH_IMAGE_URL}
+
   - name: Build and push test image
 uses: docker/build-push-action@v2
 with:
@@ -221,3 +240,27 @@ jobs:
 with:
   name: spark-on-kubernetes-it-log
   path: "**/target/integration-tests.log"
+
+  - name: Publish - Login to GitHub Container Registry
+if: ${{ inputs.publish }}
+uses: docker/login-action@v2
+with:
+  registry: ghcr.io
+  username: ${{ github.actor }}
+  password: ${{ secrets.GITHUB_TOKEN }}
+
+  - name: Publish - Login to Dockerhub Registry
+if: ${{ inputs.publish }}
+uses: docker/login-action@v2
+with:
+  username: ${{ secrets.DOCKERHUB_USER }}
+  password: ${{ secrets.DOCKERHUB_TOKEN }}
+
+  - name: Publish - Push Image
+if: ${{ inputs.publish }}
+uses: docker/build-push-action@v2
+with:
+  context: ${{ env.IMAGE_PATH }}
+  push: true
+  tags: ${{ env.PUBLISH_IMAGE_URL }}
+  platforms: linux/amd64,linux/arm64
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
new file mode 100644
index 000..a44153b

[spark-docker] branch master updated: [SPARK-40569][TESTS] Add smoke test in standalone cluster for spark-docker

2022-11-08 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 52152c1  [SPARK-40569][TESTS] Add smoke test in standalone cluster for 
spark-docker
52152c1 is described below

commit 52152c1b6d70acc2e7c5e32bffe0265b55df7b6f
Author: Qian.Sun 
AuthorDate: Wed Nov 9 09:34:47 2022 +0800

[SPARK-40569][TESTS] Add smoke test in standalone cluster for spark-docker

### What changes were proposed in this pull request?

This PR aims to add smoke test in standalone cluster for spark-docker repo.

### Why are the changes needed?

Verify spark docker works normally in standalone cluster.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New test in GA.

Closes #21 from dcoliversun/SPARK-40569.

Authored-by: Qian.Sun 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/main.yml |   3 +
 testing/run_tests.sh   |  25 ++
 testing/testing.sh | 207 +
 3 files changed, 235 insertions(+)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 08bba68..accf8ae 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -110,6 +110,9 @@ jobs:
   platforms: linux/amd64,linux/arm64
   push: true
 
+  - name : Test - Run spark application for standalone cluster on docker
+run: testing/run_tests.sh --image-url $IMAGE_URL --scala-version ${{ 
matrix.scala_version }} --spark-version ${{ matrix.spark_version }}
+
   - name: Test - Checkout Spark repository
 uses: actions/checkout@v3
 with:
diff --git a/testing/run_tests.sh b/testing/run_tests.sh
new file mode 100755
index 000..c612dcd
--- /dev/null
+++ b/testing/run_tests.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+set -eo errexit
+
+SCRIPT_DIR=$(cd "$(dirname "$0")" && pwd)
+
+. "${SCRIPT_DIR}/testing.sh"
+
+echo "Test successfully finished"
diff --git a/testing/testing.sh b/testing/testing.sh
new file mode 100755
index 000..d399d6d
--- /dev/null
+++ b/testing/testing.sh
@@ -0,0 +1,207 @@
+#!/bin/bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# This test script runs a simple smoke test in standalone cluster:
+# - create docker network
+# - start up a master
+# - start up a worker
+# - wait for the web UI endpoint to return successfully
+# - run a simple smoke test in standalone cluster
+# - clean up test resource
+
+CURL_TIMEOUT=1
+CURL_COOLDOWN=1
+CURL_MAX_TRIES=30
+
+NETWORK_NAME=spark-net-bridge
+
+SUBMIT_CONTAINER_NAME=spark-submit
+MASTER_CONTAINER_NAME=spark-master
+WORKER_CONTAINER_NAME=spark-worker
+SPARK_MASTER_PORT=7077
+SPARK_MASTER_WEBUI_CONTAINER_PORT=8080
+SPARK_MASTER_WEBUI_HOST_PORT=8080
+SPARK_WORKER_WEBUI_CONTAINER_PORT=8081
+SPARK_WORKER_WEBUI_HOST_PORT=8081
+
+SCALA_VERSION="2.12"
+SPARK_VERSION="3.3.0"
+IMAGE_URL=
+
+# Create a new docker bridge network
+function create_network() {
+  if [ ! -z $(docker network ls --filter name=^${NETWORK_NAME}$ --format="{{ 
.Name }}") ]; then
+# bridge network already exists, need to kill containers attac

[spark-docker] branch master updated: [SPARK-40969] Replace spark TGZ url with apache archive url

2022-10-31 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 243ce20  [SPARK-40969] Replace spark TGZ url with apache archive url
243ce20 is described below

commit 243ce201296c20ae48b32a87d254800e8ad197ef
Author: Qian.Sun 
AuthorDate: Tue Nov 1 11:14:04 2022 +0800

[SPARK-40969] Replace spark TGZ url with apache archive url

### What changes were proposed in this pull request?

This PR aims to replace spark TGZ url with apache archive url.

### Why are the changes needed?

```
#13 [linux/amd64 4/9] RUN set -ex; export SPARK_TMP="$(mktemp -d)"; 
cd $SPARK_TMP; wget -nv -O spark.tgz 
"https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz;; 
wget -nv -O spark.tgz.asc 
"https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc;;
 export GNUPGHOME="$(mktemp -d)"; gpg --keyserver 
hkps://keys.openpgp.org --recv-key "80FB8EBE8EBA68504989703491B5DC815DBF10D3" 
|| gpg --keyserver hkps://keyserver.ubuntu.com  [...]
#0 0.132 ++ mktemp -d
#0 0.133 + export SPARK_TMP=/tmp/tmp.oEdW8CyP9h
#0 0.133 + SPARK_TMP=/tmp/tmp.oEdW8CyP9h
#0 0.133 + cd /tmp/tmp.oEdW8CyP9h
#0 0.133 + wget -nv -O spark.tgz 
https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
#0 0.152 
https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz:
#0 0.152 2022-10-31 04:06:44 ERROR 404: Not Found.
#13 ERROR: process "/bin/sh -c set -ex; export SPARK_TMP=\"$(mktemp 
-d)\"; cd $SPARK_TMP; wget -nv -O spark.tgz \"$SPARK_TGZ_URL\"; 
wget -nv -O spark.tgz.asc \"$SPARK_TGZ_ASC_URL\"; export 
GNUPGHOME=\"$(mktemp -d)\"; gpg --keyserver hkps://keys.openpgp.org 
--recv-key \"$GPG_KEY\" || gpg --keyserver hkps://keyserver.ubuntu.com 
--recv-keys \"$GPG_KEY\"; gpg --batch --verify spark.tgz.asc spark.tgz; 
gpgconf --kill all; rm -rf \"$GNUPGHOME\" spark.t [...]
```
Old url 
`https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz` is not 
found. Better to use unity apache archive url.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No need to add new tests.

Closes #22 from dcoliversun/SPARK-40969.

Authored-by: Qian.Sun 
Signed-off-by: Yikun Jiang 
---
 3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 4 ++--
 3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile   | 4 ++--
 3.3.0/scala2.12-java11-r-ubuntu/Dockerfile | 4 ++--
 3.3.0/scala2.12-java11-ubuntu/Dockerfile   | 4 ++--
 3.3.1/scala2.12-java11-python3-r-ubuntu/Dockerfile | 4 ++--
 3.3.1/scala2.12-java11-python3-ubuntu/Dockerfile   | 4 ++--
 3.3.1/scala2.12-java11-r-ubuntu/Dockerfile | 4 ++--
 3.3.1/scala2.12-java11-ubuntu/Dockerfile   | 4 ++--
 Dockerfile.template| 4 ++--
 9 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
index 8c2761e..fb48b80 100644
--- a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
+++ b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
@@ -42,8 +42,8 @@ RUN set -ex && \
 
 # Install Apache Spark
 # https://downloads.apache.org/spark/KEYS
-ENV 
SPARK_TGZ_URL=https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
 \
-
SPARK_TGZ_ASC_URL=https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc
 \
+ENV 
SPARK_TGZ_URL=https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
 \
+
SPARK_TGZ_ASC_URL=https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc
 \
 GPG_KEY=80FB8EBE8EBA68504989703491B5DC815DBF10D3
 
 RUN set -ex; \
diff --git a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
index 6a0017a..1b6a02c 100644
--- a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
+++ b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
@@ -41,8 +41,8 @@ RUN set -ex && \
 
 # Install Apache Spark
 # https://downloads.apache.org/spark/KEYS
-ENV 
SPARK_TGZ_URL=https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
 \
-
SPARK_TGZ_ASC_URL=https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc
 \
+ENV 
SPARK_TGZ_URL=https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
 \
+
SPARK_TGZ_ASC_URL=https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc
 \
 GPG_KEY=80FB8EBE8EBA68504989703491B5DC815DBF10D3
 
 RUN set -ex; \
diff --git a/3.3.0/sca

[spark] branch master updated: [SPARK-40229][PS][TEST][FOLLOWUP] Add `openpyxl` to `requirements.txt`

2022-10-28 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5c9843db2b3 [SPARK-40229][PS][TEST][FOLLOWUP] Add `openpyxl` to 
`requirements.txt`
5c9843db2b3 is described below

commit 5c9843db2b3ddec0b03374df03dcaa1847941c34
Author: Dongjoon Hyun 
AuthorDate: Fri Oct 28 19:05:38 2022 +0800

[SPARK-40229][PS][TEST][FOLLOWUP] Add `openpyxl` to `requirements.txt`

### What changes were proposed in this pull request?

This is a follow-up of https://github.com/apache/spark/pull/37671.

### Why are the changes needed?

Since https://github.com/apache/spark/pull/37671 added `openpyxl` for 
PySpark test environments and re-enabled `test_to_excel` test, we need to add 
it to `requirements.txt` as PySpark test dependency explicitly.

### Does this PR introduce _any_ user-facing change?

No. This is a test dependency.

### How was this patch tested?

Manually.

Closes #38425 from dongjoon-hyun/SPARK-40229.

Authored-by: Dongjoon Hyun 
Signed-off-by: Yikun Jiang 
---
 dev/requirements.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/dev/requirements.txt b/dev/requirements.txt
index fa4b6752f14..2f32066d6a8 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -13,6 +13,7 @@ matplotlib<3.3.0
 
 # PySpark test dependencies
 unittest-xml-reporting
+openpyxl
 
 # PySpark test dependencies (optional)
 coverage


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-40855] Add CONTRIBUTING.md for apache/spark-docker

2022-10-24 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new f6bab6b  [SPARK-40855] Add CONTRIBUTING.md for apache/spark-docker
f6bab6b is described below

commit f6bab6be5ddcd41d2b6c1b0c139316bc311e13aa
Author: Qian.Sun 
AuthorDate: Tue Oct 25 10:15:10 2022 +0800

[SPARK-40855] Add CONTRIBUTING.md for apache/spark-docker

### What changes were proposed in this pull request?

This PR aims to add `CONTRIBUTING.md` for apache/spark-docker.

### Why are the changes needed?

Better to briefly explain how to contribute DOI.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?


![image](https://user-images.githubusercontent.com/44011673/197155544-bfae0c70-ee01-44b0-851d-ed5c288129d9.png)

Closes #19 from dcoliversun/SPARK-40855.

Authored-by: Qian.Sun 
Signed-off-by: Yikun Jiang 
---
 CONTRIBUTING.md | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 000..4ba4baa
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,22 @@
+## Contributing to Spark Docker
+
+Thanks for improving the project! *Before opening a pull request*, review the 
+[Contributing to Spark guide](https://spark.apache.org/contributing.html). 
+It lists steps that are required before creating a PR. In particular, consider:
+
+- Is the change important and ready enough to ask the community to spend time 
reviewing?
+- Have you searched for existing, related JIRAs and pull requests?
+- Is this a new feature that can stand alone as a [third party 
project](https://spark.apache.org/third-party-projects.html) ?
+- Is the change being proposed clearly explained and motivated?
+
+When you contribute code, you affirm that the contribution is your original 
work and that you 
+license the work to the project under the project's open source license. 
Whether or not you 
+state this explicitly, by submitting any copyrighted material via pull 
request, email, or 
+other means you agree to license the material under the project's open source 
license and 
+warrant that you have the legal authority to do so.
+
+### How to update Dockerfile
+
+- Update `Dockerfile.template`
+- Update `tools/template.py` if need template file render change
+- Exec `add-dockerfiles.sh `
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-40882][INFRA] Upgrade actions/setup-java to v3 with distribution specified

2022-10-24 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 825f2190bd8 [SPARK-40882][INFRA] Upgrade actions/setup-java to v3 with 
distribution specified
825f2190bd8 is described below

commit 825f2190bd826a8a877739454393e79ef163fdf1
Author: Yikun Jiang 
AuthorDate: Mon Oct 24 14:51:26 2022 +0800

[SPARK-40882][INFRA] Upgrade actions/setup-java to v3 with distribution 
specified

### What changes were proposed in this pull request?
Upgrade actions/setup-java to v3 with distribution specified

### Why are the changes needed?

- The `distribution` is required after v2, now just keep `zulu` (same 
distribution with v1): https://github.com/actions/setup-java/releases/tag/v2.0.0
- https://github.com/actions/setup-java/releases/tag/v3.0.0: Upgrade node
- https://github.com/actions/setup-java/releases/tag/v3.6.0: Cleanup 
set-output warning

### Does this PR introduce _any_ user-facing change?
No,dev only

### How was this patch tested?
CI passed

Closes #38354 from Yikun/SPARK-40882.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/benchmark.yml|  6 --
 .github/workflows/build_and_test.yml   | 27 ++-
 .github/workflows/publish_snapshot.yml |  3 ++-
 3 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
index 227c444a7a4..8671cff054b 100644
--- a/.github/workflows/benchmark.yml
+++ b/.github/workflows/benchmark.yml
@@ -105,8 +105,9 @@ jobs:
 run: cd tpcds-kit/tools && make OS=LINUX
   - name: Install Java ${{ github.event.inputs.jdk }}
 if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true'
-uses: actions/setup-java@v1
+uses: actions/setup-java@v3
 with:
+  distribution: temurin
   java-version: ${{ github.event.inputs.jdk }}
   - name: Generate TPC-DS (SF=1) table data
 if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true'
@@ -156,8 +157,9 @@ jobs:
 restore-keys: |
   benchmark-coursier-${{ github.event.inputs.jdk }}
 - name: Install Java ${{ github.event.inputs.jdk }}
-  uses: actions/setup-java@v1
+  uses: actions/setup-java@v3
   with:
+distribution: temurin
 java-version: ${{ github.event.inputs.jdk }}
 - name: Cache TPC-DS generated data
   if: contains(github.event.inputs.class, 'TPCDSQueryBenchmark') || 
contains(github.event.inputs.class, '*')
diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 0e0314e2950..688c40cc3b6 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -227,8 +227,9 @@ jobs:
 restore-keys: |
   ${{ matrix.java }}-${{ matrix.hadoop }}-coursier-
 - name: Install Java ${{ matrix.java }}
-  uses: actions/setup-java@v1
+  uses: actions/setup-java@v3
   with:
+distribution: temurin
 java-version: ${{ matrix.java }}
 - name: Install Python 3.8
   uses: actions/setup-python@v2
@@ -384,8 +385,9 @@ jobs:
 restore-keys: |
   pyspark-coursier-
 - name: Install Java ${{ matrix.java }}
-  uses: actions/setup-java@v1
+  uses: actions/setup-java@v3
   with:
+distribution: temurin
 java-version: ${{ matrix.java }}
 - name: List Python packages (Python 3.9, PyPy3)
   run: |
@@ -473,8 +475,9 @@ jobs:
 restore-keys: |
   sparkr-coursier-
 - name: Install Java ${{ inputs.java }}
-  uses: actions/setup-java@v1
+  uses: actions/setup-java@v3
   with:
+distribution: temurin
 java-version: ${{ inputs.java }}
 - name: Run tests
   env: ${{ fromJSON(inputs.envs) }}
@@ -597,8 +600,9 @@ jobs:
 cd docs
 bundle install
 - name: Install Java 8
-  uses: actions/setup-java@v1
+  uses: actions/setup-java@v3
   with:
+distribution: temurin
 java-version: 8
 - name: Scala linter
   run: ./dev/lint-scala
@@ -664,8 +668,9 @@ jobs:
 restore-keys: |
   java${{ matrix.java }}-maven-
 - name: Install Java ${{ matrix.java }}
-  uses: actions/setup-java@v1
+  uses: actions/setup-java@v3
   with:
+distribution: temurin
 java-version: ${{ matrix.java }}
 - name: Build with Maven
   run: |
@@ -713,8 +718,9 @@ jobs:
 restore-keys: |
   scala-213-coursier-
 - name: Install Java 8
-  uses: actions/setup-java@v1
+  uses: actions/setup-java@v3
   with:
+distribution: temurin
 java-version: 8
 - name: Build with SBT
   run: |
@@ -761,8 +767,9 @@ jobs:
 rest

[spark] branch master updated (58490da6d2e -> c721c7299d8)

2022-10-24 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 58490da6d2e [SPARK-40800][SQL] Always inline expressions in 
OptimizeOneRowRelationSubquery
 add c721c7299d8 [SPARK-40881][INFRA] Upgrade actions/cache to v3 and 
actions/upload-artifact to v3

No new revisions were added by this update.

Summary of changes:
 .github/workflows/benchmark.yml| 14 
 .github/workflows/build_and_test.yml   | 60 +-
 .github/workflows/publish_snapshot.yml |  2 +-
 3 files changed, 38 insertions(+), 38 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (f0950fea814 -> fea6458806d)

2022-10-22 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f0950fea814 [SPARK-40878][INFRA] pin 'grpcio==1.48.1' 
'protobuf==4.21.6'
 add fea6458806d [SPARK-40870][INFRA] Upgrade docker actions to cleanup 
warning

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml   | 6 +++---
 .github/workflows/build_infra_images_cache.yml | 8 
 2 files changed, 7 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-40864] Remove pip/setuptools dynamic upgrade

2022-10-21 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 52e5856  [SPARK-40864] Remove pip/setuptools dynamic upgrade
52e5856 is described below

commit 52e5856d81e70a9d9e87292c6caf42587ce433df
Author: Yikun Jiang 
AuthorDate: Fri Oct 21 17:02:54 2022 +0800

[SPARK-40864] Remove pip/setuptools dynamic upgrade

### What changes were proposed in this pull request?
Remove pip/setuptools dynamic upgrade in dockerfile

### Why are the changes needed?
According to [official image 
suggestion](https://github.com/docker-library/official-images#repeatability), 
`Rebuilding the same Dockerfile should result in the same version of the image 
being packaged`.

But we used to upgrade pip/setuptools to latest, actually we don't need a 
latest pip/setuptools for any reason I can think out. I also take a look on 
[initial 
commits](https://github.com/apache-spark-on-k8s/spark/commit/befcf0a30651d0335bb57c242a824e43748db33f)
 for this line, according merge history no more reason for it.

### Does this PR introduce _any_ user-facing change?
The OS recommand pip/setuptools version is used.

### How was this patch tested?

CI passed.

Closes #17 from Yikun/remove-pip.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 1 -
 3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile   | 1 -
 Dockerfile.template| 1 -
 3 files changed, 3 deletions(-)

diff --git a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
index ac16bdd..8c2761e 100644
--- a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
+++ b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
@@ -26,7 +26,6 @@ RUN set -ex && \
 ln -s /lib /lib64 && \
 apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools gosu && \
 apt install -y python3 python3-pip && \
-pip3 install --upgrade pip setuptools && \
 apt install -y r-base r-base-dev && \
 mkdir -p /opt/spark && \
 mkdir /opt/spark/python && \
diff --git a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
index c6e433d..6a0017a 100644
--- a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
+++ b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
@@ -26,7 +26,6 @@ RUN set -ex && \
 ln -s /lib /lib64 && \
 apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools gosu && \
 apt install -y python3 python3-pip && \
-pip3 install --upgrade pip setuptools && \
 mkdir -p /opt/spark && \
 mkdir /opt/spark/python && \
 mkdir -p /opt/spark/examples && \
diff --git a/Dockerfile.template b/Dockerfile.template
index 2b90fe5..a220247 100644
--- a/Dockerfile.template
+++ b/Dockerfile.template
@@ -27,7 +27,6 @@ RUN set -ex && \
 apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools gosu && \
 {%- if HAVE_PY %}
 apt install -y python3 python3-pip && \
-pip3 install --upgrade pip setuptools && \
 {%- endif %}
 {%- if HAVE_R %}
 apt install -y r-base r-base-dev && \


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-40866][INFRA] Rename Spark repository as Spark Docker repository in GA

2022-10-21 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 6f56ef1  [SPARK-40866][INFRA] Rename Spark repository as Spark Docker 
repository in GA
6f56ef1 is described below

commit 6f56ef1c8c8bccd05069d4590f7ae084d4c72b4d
Author: Qian.Sun 
AuthorDate: Fri Oct 21 16:02:50 2022 +0800

[SPARK-40866][INFRA] Rename Spark repository as Spark Docker repository in 
GA

### What changes were proposed in this pull request?

This PR aim to rename `Spark repository` as `Spark Docker repository` in 
GA, discussion as 
https://github.com/apache/spark-docker/pull/15#discussion_r1001440707

### Why are the changes needed?

Actually repository is apache/spark-docker.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass the GA

Closes #18 from dcoliversun/SPARK-40866.

Authored-by: Qian.Sun 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/main.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index b47245b..08bba68 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -60,7 +60,7 @@ jobs:
   - ${{ inputs.java }}
 image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu]
 steps:
-  - name: Checkout Spark repository
+  - name: Checkout Spark Docker repository
 uses: actions/checkout@v2
 
   - name: Set up QEMU


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-40860][INFRA] Change `set-output` to `GITHUB_EVENT`

2022-10-20 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 40086cb9b21 [SPARK-40860][INFRA] Change `set-output` to `GITHUB_EVENT`
40086cb9b21 is described below

commit 40086cb9b21fe207242c4928d8e2cc3e756d61da
Author: Yikun Jiang 
AuthorDate: Fri Oct 21 11:06:33 2022 +0800

[SPARK-40860][INFRA] Change `set-output` to `GITHUB_EVENT`

### What changes were proposed in this pull request?
Change `set-output` to `GITHUB_OUTPUT`.

### Why are the changes needed?
The `set-output` command is deprecated and will be disabled soon. Please 
upgrade to using Environment Files. For more information see: 
https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

### Does this PR introduce _any_ user-facing change?
No, dev only

### How was this patch tested?
- CI passed
- Also do a local test on benchmark: 
https://github.com/Yikun/spark/actions/runs/3294384181/jobs/5431945626

Closes #38323 from Yikun/set-output.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/benchmark.yml  |  2 +-
 .github/workflows/build_and_test.yml | 13 ++---
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
index 5508227b8b2..f73267a95fa 100644
--- a/.github/workflows/benchmark.yml
+++ b/.github/workflows/benchmark.yml
@@ -54,7 +54,7 @@ jobs:
 steps:
 - name: Generate matrix
   id: set-matrix
-  run: echo "::set-output name=matrix::["`seq -s, 1 
$SPARK_BENCHMARK_NUM_SPLITS`"]"
+  run: echo "matrix=["`seq -s, 1 $SPARK_BENCHMARK_NUM_SPLITS`"]" >> 
$GITHUB_OUTPUT
 
   # Any TPC-DS related updates on this job need to be applied to tpcds-1g job 
of build_and_test.yml as well
   tpcds-1g-gen:
diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index e0adad54aed..f9b445e9bbd 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -103,16 +103,15 @@ jobs:
   \"k8s-integration-tests\" : \"true\",
 }"
   echo $precondition # For debugging
-  # GitHub Actions set-output doesn't take newlines
-  # 
https://github.community/t/set-output-truncates-multiline-strings/16852/3
-  precondition="${precondition//$'\n'/'%0A'}"
-  echo "::set-output name=required::$precondition"
+  # Remove `\n` to avoid "Invalid format" error
+  precondition="${precondition//$'\n'/}}"
+  echo "required=$precondition" >> $GITHUB_OUTPUT
 else
   # This is usually set by scheduled jobs.
   precondition='${{ inputs.jobs }}'
   echo $precondition # For debugging
-  precondition="${precondition//$'\n'/'%0A'}"
-  echo "::set-output name=required::$precondition"
+  precondition="${precondition//$'\n'/}"
+  echo "required=$precondition" >> $GITHUB_OUTPUT
 fi
 - name: Generate infra image URL
   id: infra-image-outputs
@@ -121,7 +120,7 @@ jobs:
 REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr '[:upper:]' 
'[:lower:]')
 IMG_NAME="apache-spark-ci-image:${{ inputs.branch }}-${{ github.run_id 
}}"
 IMG_URL="ghcr.io/$REPO_OWNER/$IMG_NAME"
-echo ::set-output name=image_url::$IMG_URL
+echo "image_url=$IMG_URL" >> $GITHUB_OUTPUT
 
   # Build: build Spark and run the tests for specified modules.
   build:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-40859][INFRA] Upgrade action/checkout to v3 to cleanup warning

2022-10-20 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 17efe044fa7 [SPARK-40859][INFRA] Upgrade action/checkout to v3 to 
cleanup warning
17efe044fa7 is described below

commit 17efe044fa7d366fa0beafe71c5e76d46f942b7e
Author: Yikun Jiang 
AuthorDate: Fri Oct 21 10:36:00 2022 +0800

[SPARK-40859][INFRA] Upgrade action/checkout to v3 to cleanup warning

### What changes were proposed in this pull request?
Upgrade action/checkout to v3 (point ot v3.1 now).

### Why are the changes needed?
- https://github.com/actions/checkout/releases/tag/v3.1.0 cleanup "[The 
'set-output' command is deprecated and will be disabled 
soon.](https://github.com/actions/checkout/issues/959#issuecomment-1282107197)"
- https://github.com/actions/checkout/releases/tag/v3.0.0 since v3, use the 
node 16 to cleanup "[Node.js 12 actions are 
deprecated](https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/)"

According to 
https://github.com/actions/checkout/issues/959#issuecomment-1282107197, v2.5 
also address 'set-output' warning, but only v3 support node 16, so we upgrade 
to v3.1 rather than v2.5

### Does this PR introduce _any_ user-facing change?
No, dev only

### How was this patch tested?
CI passed

    Closes #38322 from Yikun/checkout-v3.
    
    Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/benchmark.yml|  6 +++---
 .github/workflows/build_and_test.yml   | 24 
 .github/workflows/build_infra_images_cache.yml |  2 +-
 .github/workflows/publish_snapshot.yml |  2 +-
 4 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
index 52adec20e5c..5508227b8b2 100644
--- a/.github/workflows/benchmark.yml
+++ b/.github/workflows/benchmark.yml
@@ -65,7 +65,7 @@ jobs:
   SPARK_LOCAL_IP: localhost
 steps:
   - name: Checkout Spark repository
-uses: actions/checkout@v2
+uses: actions/checkout@v3
 # In order to get diff files
 with:
   fetch-depth: 0
@@ -95,7 +95,7 @@ jobs:
   key: tpcds-${{ hashFiles('.github/workflows/benchmark.yml', 
'sql/core/src/test/scala/org/apache/spark/sql/TPCDSSchema.scala') }}
   - name: Checkout tpcds-kit repository
 if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true'
-uses: actions/checkout@v2
+uses: actions/checkout@v3
 with:
   repository: databricks/tpcds-kit
   ref: 2a5078a782192ddb6efbcead8de9973d6ab4f069
@@ -133,7 +133,7 @@ jobs:
   SPARK_TPCDS_DATA: ${{ github.workspace }}/tpcds-sf-1
 steps:
 - name: Checkout Spark repository
-  uses: actions/checkout@v2
+  uses: actions/checkout@v3
   # In order to get diff files
   with:
 fetch-depth: 0
diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 12a1ad0e71e..e0adad54aed 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -63,7 +63,7 @@ jobs:
 }}
 steps:
 - name: Checkout Spark repository
-  uses: actions/checkout@v2
+  uses: actions/checkout@v3
   with:
 fetch-depth: 0
 repository: apache/spark
@@ -195,7 +195,7 @@ jobs:
   SPARK_LOCAL_IP: localhost
 steps:
 - name: Checkout Spark repository
-  uses: actions/checkout@v2
+  uses: actions/checkout@v3
   # In order to fetch changed files
   with:
 fetch-depth: 0
@@ -286,7 +286,7 @@ jobs:
   username: ${{ github.actor }}
   password: ${{ secrets.GITHUB_TOKEN }}
   - name: Checkout Spark repository
-uses: actions/checkout@v2
+uses: actions/checkout@v3
 # In order to fetch changed files
 with:
   fetch-depth: 0
@@ -349,7 +349,7 @@ jobs:
   METASPACE_SIZE: 1g
 steps:
 - name: Checkout Spark repository
-  uses: actions/checkout@v2
+  uses: actions/checkout@v3
   # In order to fetch changed files
   with:
 fetch-depth: 0
@@ -438,7 +438,7 @@ jobs:
   SKIP_MIMA: true
 steps:
 - name: Checkout Spark repository
-  uses: actions/checkout@v2
+  uses: actions/checkout@v3
   # In order to fetch changed files
   with:
 fetch-depth: 0
@@ -508,7 +508,7 @@ jobs:
   image: ${{ needs.precondition.outputs.image_url }}
 steps:
 - name: Checkout Spark repository
-  uses: actions/checkout@v2
+  uses: actions/checkout@v3
   with:
 fetch-depth: 0
 repository: apache/spark
@@ -635,7 +635,7 @@ jobs:
 runs-on: ubuntu-20.04
 steps:
 - name: Check

[spark] branch master updated: [SPARK-40838][INFRA][TESTS] Upgrade infra base image to focal-20220922 and fix ps.mlflow doctest

2022-10-20 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2698d6bf10b [SPARK-40838][INFRA][TESTS] Upgrade infra base image to 
focal-20220922 and fix ps.mlflow doctest
2698d6bf10b is described below

commit 2698d6bf10b92e71e8af88fedb4e7c9e0f304416
Author: Yikun Jiang 
AuthorDate: Thu Oct 20 15:54:18 2022 +0800

[SPARK-40838][INFRA][TESTS] Upgrade infra base image to focal-20220922 and 
fix ps.mlflow doctest

### What changes were proposed in this pull request?
Upgrade infra base image to focal-20220922 and fix ps.mlflow doctest

### Why are the changes needed?
- Upgrade infra base image to `focal-20220922` (Ubuntu 20.04 currently 
latest)
- Infra Image Python version updated.
  - numpy 1.23.3 --> 1.23.4
  - mlflow 1.28.0 --> 1.29.0
  - matplotlib 3.5.3 --> 3.6.1
  - pip 22.2.2 --> 22.3
  - scipy 1.9.1 --> 1.9.3

  Full list: https://www.diffchecker.com/e6eZZaYn
- Fix ps.mlfow doctest (due to mlflow upgrade):
```
**
File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 158, in 
pyspark.pandas.mlflow.load_model
Failed example:
with mlflow.start_run():
lr = LinearRegression()
lr.fit(train_x, train_y)
mlflow.sklearn.log_model(lr, "model")
Expected:
LinearRegression(...)
Got:
LinearRegression()

```

### Does this PR introduce _any_ user-facing change?
No, dev only

### How was this patch tested?
All CI passed

Closes #38304 from Yikun/SPARK-40838.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 dev/infra/Dockerfile| 4 ++--
 python/pyspark/pandas/mlflow.py | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index ccf0c932b0e..2a70bd3f98f 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -17,9 +17,9 @@
 
 # Image for building and testing Spark branches. Based on Ubuntu 20.04.
 # See also in https://hub.docker.com/_/ubuntu
-FROM ubuntu:focal-20220801
+FROM ubuntu:focal-20220922
 
-ENV FULL_REFRESH_DATE 20220706
+ENV FULL_REFRESH_DATE 20221019
 
 ENV DEBIAN_FRONTEND noninteractive
 ENV DEBCONF_NONINTERACTIVE_SEEN true
diff --git a/python/pyspark/pandas/mlflow.py b/python/pyspark/pandas/mlflow.py
index 094215743e2..469349b37ee 100644
--- a/python/pyspark/pandas/mlflow.py
+++ b/python/pyspark/pandas/mlflow.py
@@ -159,7 +159,7 @@ def load_model(
 ... lr = LinearRegression()
 ... lr.fit(train_x, train_y)
 ... mlflow.sklearn.log_model(lr, "model")
-LinearRegression(...)
+LinearRegression...
 
 Now that our model is logged using MLflow, we load it back and apply it on 
a pandas-on-Spark
 dataframe:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-40845] Add template support for SPARK_GPG_KEY and fix GPG verify

2022-10-20 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 896a36e  [SPARK-40845] Add template support for SPARK_GPG_KEY and fix 
GPG verify
896a36e is described below

commit 896a36e36c094bf1480f4819005e2982ea8af417
Author: Yikun Jiang 
AuthorDate: Thu Oct 20 15:38:03 2022 +0800

[SPARK-40845] Add template support for SPARK_GPG_KEY and fix GPG verify

### What changes were proposed in this pull request?
This patch:
- Add template support for `SPARK_GPG_KEY`.
- Fix a bug on GPG verified. (Change `||` to `;`)
- Use opengpg.org instead of gpg.com becasue it would be uploaded in [spark 
release process](https://spark.apache.org/release-process.html).

### Why are the changes needed?
Each version have specific GPG key to verified, so we need to set GPG 
version separately.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed
Run `./add-dockerfiles.sh 3.3.0` and see GPG set correctly

Closes #16 from Yikun/GPG.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 6 +++---
 3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile   | 6 +++---
 3.3.0/scala2.12-java11-r-ubuntu/Dockerfile | 6 +++---
 3.3.0/scala2.12-java11-ubuntu/Dockerfile   | 6 +++---
 Dockerfile.template| 6 +++---
 tools/template.py  | 6 ++
 6 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
index be9cbb0..ac16bdd 100644
--- a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
+++ b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
@@ -45,7 +45,7 @@ RUN set -ex && \
 # https://downloads.apache.org/spark/KEYS
 ENV 
SPARK_TGZ_URL=https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
 \
 
SPARK_TGZ_ASC_URL=https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc
 \
-GPG_KEY=E298A3A825C0D65DFD57CBB651716619E084DAB9
+GPG_KEY=80FB8EBE8EBA68504989703491B5DC815DBF10D3
 
 RUN set -ex; \
 export SPARK_TMP="$(mktemp -d)"; \
@@ -53,8 +53,8 @@ RUN set -ex; \
 wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \
 wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \
 export GNUPGHOME="$(mktemp -d)"; \
-gpg --keyserver hkps://keyserver.pgp.com --recv-key "$GPG_KEY" || \
-gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY" || \
+gpg --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \
+gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; \
 gpg --batch --verify spark.tgz.asc spark.tgz; \
 gpgconf --kill all; \
 rm -rf "$GNUPGHOME" spark.tgz.asc; \
diff --git a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
index 096c7eb..c6e433d 100644
--- a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
+++ b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
@@ -44,7 +44,7 @@ RUN set -ex && \
 # https://downloads.apache.org/spark/KEYS
 ENV 
SPARK_TGZ_URL=https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
 \
 
SPARK_TGZ_ASC_URL=https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc
 \
-GPG_KEY=E298A3A825C0D65DFD57CBB651716619E084DAB9
+GPG_KEY=80FB8EBE8EBA68504989703491B5DC815DBF10D3
 
 RUN set -ex; \
 export SPARK_TMP="$(mktemp -d)"; \
@@ -52,8 +52,8 @@ RUN set -ex; \
 wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \
 wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \
 export GNUPGHOME="$(mktemp -d)"; \
-gpg --keyserver hkps://keyserver.pgp.com --recv-key "$GPG_KEY" || \
-gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY" || \
+gpg --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \
+gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; \
 gpg --batch --verify spark.tgz.asc spark.tgz; \
 gpgconf --kill all; \
 rm -rf "$GNUPGHOME" spark.tgz.asc; \
diff --git a/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile
index 2e085a2..975e444 100644
--- a/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile
+++ b/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile
@@ -42,7 +42,7 @@ RUN set -ex && \
 # https://downloads.apache.org/spark/KEYS
 ENV 
SPARK_TGZ_URL=https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
 \
 
SPARK_TGZ_ASC_URL=https://downloads.apache.org/spark/

[spark-docker] branch master updated: [SPARK-40833] Cleanup apt lists cache

2022-10-18 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 95f5a1f  [SPARK-40833] Cleanup apt lists cache
95f5a1f is described below

commit 95f5a1f3e846ad3b6550e151fa76b70f6fe0b946
Author: Yikun Jiang 
AuthorDate: Wed Oct 19 10:17:58 2022 +0800

[SPARK-40833] Cleanup apt lists cache

### What changes were proposed in this pull request?
Remove unused apt lists cache and apply `./add-dockerfiles.sh 3.3.0`

### Why are the changes needed?
Clean cache to reduce docker image size.

This is also 
[recommanded](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#run)
 by docker community:

```
$ docker run --user 0:0 -ti apache/spark bash
root5d1ca347279e:/opt/spark/work-dir# ls /var/lib/apt/lists/
auxfiles
 lock
deb.debian.org_debian_dists_bullseye-updates_InRelease  
 partial
deb.debian.org_debian_dists_bullseye-updates_main_binary-arm64_Packages.lz4 
 security.debian.org_debian-security_dists_bullseye-security_InRelease
deb.debian.org_debian_dists_bullseye_InRelease  
 
security.debian.org_debian-security_dists_bullseye-security_main_binary-arm64_Packages.lz4
deb.debian.org_debian_dists_bullseye_main_binary-arm64_Packages.lz4
root5d1ca347279e:/opt/spark/work-dir# du --max-depth=1 -h 
/var/lib/apt/lists/
4.0K/var/lib/apt/lists/partial
4.0K/var/lib/apt/lists/auxfiles
17M /var/lib/apt/lists/
```

### Does this PR introduce _any_ user-facing change?
Yes in some level, image size is reduced.

### How was this patch tested?
K8s CI passed

Closes #14 from Yikun/clean-apt-list.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 3 ++-
 3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile   | 3 ++-
 3.3.0/scala2.12-java11-r-ubuntu/Dockerfile | 3 ++-
 3.3.0/scala2.12-java11-ubuntu/Dockerfile   | 3 ++-
 Dockerfile.template| 3 ++-
 5 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
index 5dbc973..be9cbb0 100644
--- a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
+++ b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
@@ -38,7 +38,8 @@ RUN set -ex && \
 ln -sv /bin/bash /bin/sh && \
 echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
 chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
-rm -rf /var/cache/apt/*
+rm -rf /var/cache/apt/* && \
+rm -rf /var/lib/apt/lists/*
 
 # Install Apache Spark
 # https://downloads.apache.org/spark/KEYS
diff --git a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
index 85e06ce..096c7eb 100644
--- a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
+++ b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
@@ -37,7 +37,8 @@ RUN set -ex && \
 ln -sv /bin/bash /bin/sh && \
 echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
 chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
-rm -rf /var/cache/apt/*
+rm -rf /var/cache/apt/* && \
+rm -rf /var/lib/apt/lists/*
 
 # Install Apache Spark
 # https://downloads.apache.org/spark/KEYS
diff --git a/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile
index 753d585..2e085a2 100644
--- a/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile
+++ b/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile
@@ -35,7 +35,8 @@ RUN set -ex && \
 ln -sv /bin/bash /bin/sh && \
 echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
 chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
-rm -rf /var/cache/apt/*
+rm -rf /var/cache/apt/* && \
+rm -rf /var/lib/apt/lists/*
 
 # Install Apache Spark
 # https://downloads.apache.org/spark/KEYS
diff --git a/3.3.0/scala2.12-java11-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-ubuntu/Dockerfile
index 1e4c604..5858e2d 100644
--- a/3.3.0/scala2.12-java11-ubuntu/Dockerfile
+++ b/3.3.0/scala2.12-java11-ubuntu/Dockerfile
@@ -34,7 +34,8 @@ RUN set -ex && \
 ln -sv /bin/bash /bin/sh && \
 echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
 chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
-rm -rf /var/cache/apt/*
+rm -rf /var/cache/apt/* && \
+rm -rf /var/lib/apt/lists/*
 

[spark-docker] branch master updated: [SPARK-40832][DOCS] Add README for spark-docker

2022-10-18 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new c1353a3  [SPARK-40832][DOCS] Add README for spark-docker
c1353a3 is described below

commit c1353a377176d9f2a84641323840130bd160e436
Author: Yikun Jiang 
AuthorDate: Wed Oct 19 10:16:41 2022 +0800

[SPARK-40832][DOCS] Add README for spark-docker

### What changes were proposed in this pull request?
Add README for spark-docker

### Why are the changes needed?
Although the PR of DOI has not been merged yet, but we'd better to briefly 
explain what this repository does.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview manually:

https://user-images.githubusercontent.com/1736354/196381318-cb3d72e1-1ba7-479c-82cb-4412dde91179.png;>

Closes #13 from Yikun/readme.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 README.md | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/README.md b/README.md
new file mode 100644
index 000..87286dc
--- /dev/null
+++ b/README.md
@@ -0,0 +1,18 @@
+# Apache Spark Official Dockerfiles
+
+## What is Apache Spark?
+
+Spark is a unified analytics engine for large-scale data processing. It 
provides
+high-level APIs in Scala, Java, Python, and R, and an optimized engine that
+supports general computation graphs for data analysis. It also supports a
+rich set of higher-level tools including Spark SQL for SQL and DataFrames,
+pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX 
for graph processing,
+and Structured Streaming for stream processing.
+
+https://spark.apache.org/
+
+## About this repository
+
+This repository contains the Dockerfiles used to build the Apache Spark Docker 
Image.
+
+See more in [SPARK-40513: SPIP: Support Docker Official Image for 
Spark](https://issues.apache.org/jira/browse/SPARK-40513).


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-40528] Support dockerfile template

2022-10-17 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 6459e3d  [SPARK-40528] Support dockerfile template
6459e3d is described below

commit 6459e3d09a2e009573be355e63c404bb35139d28
Author: Yikun Jiang 
AuthorDate: Mon Oct 17 16:23:23 2022 +0800

[SPARK-40528] Support dockerfile template

### What changes were proposed in this pull request?
This patch:
- Add dockerfile template: `Dockerfile.template` contains 3 vars: 
`BASE_IMAGE` for base image name, `HAVE_PY` for adding python support, `HAVE_R` 
for adding sparkr support.
- Add a script: `add-dockerfiles.sh`, you can `./add-dockerfiles.sh 3.3.0`
- Add a tool: `tempalte.py` to help generate dockerfile from jinja template.

### Why are the changes needed?
Generate the dockerfiles to make life easier.

### Does this PR introduce _any_ user-facing change?
No, dev only.

### How was this patch tested?
```shell
# Prepare new env
python3 -m venv ~/xxx
pip install -r ./tools/requirements.txt
source ~/xxx/bin/activate

# Generate 3.3.0
./add-dockerfiles.sh 3.3.0

# no diff
git diff
```

lint:
```
$ flake8 ./tools/template.py
$ black ./tools/template.py
All done! ✨  ✨
1 file left unchanged.
```

Closes #12 from Yikun/SPARK-40528.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 Dockerfile.template|  98 ++
 add-dockerfiles.sh |  53 +++
 entrypoint.sh.template | 114 +
 tools/requirements.txt |   1 +
 tools/template.py  |  84 
 5 files changed, 350 insertions(+)

diff --git a/Dockerfile.template b/Dockerfile.template
new file mode 100644
index 000..2001281
--- /dev/null
+++ b/Dockerfile.template
@@ -0,0 +1,98 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+FROM {{ BASE_IMAGE }}
+
+ARG spark_uid=185
+
+RUN groupadd --system --gid=${spark_uid} spark && \
+useradd --system --uid=${spark_uid} --gid=spark spark
+
+RUN set -ex && \
+apt-get update && \
+ln -s /lib /lib64 && \
+apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools gosu && \
+{%- if HAVE_PY %}
+apt install -y python3 python3-pip && \
+pip3 install --upgrade pip setuptools && \
+{%- endif %}
+{%- if HAVE_R %}
+apt install -y r-base r-base-dev && \
+{%- endif %}
+mkdir -p /opt/spark && \
+{%- if HAVE_PY %}
+mkdir /opt/spark/python && \
+{%- endif %}
+mkdir -p /opt/spark/examples && \
+mkdir -p /opt/spark/work-dir && \
+touch /opt/spark/RELEASE && \
+chown -R spark:spark /opt/spark && \
+rm /bin/sh && \
+ln -sv /bin/bash /bin/sh && \
+echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
+chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
+rm -rf /var/cache/apt/*
+
+# Install Apache Spark
+# https://downloads.apache.org/spark/KEYS
+ENV SPARK_TGZ_URL=https://dlcdn.apache.org/spark/spark-{{ SPARK_VERSION 
}}/spark-{{ SPARK_VERSION }}-bin-hadoop3.tgz \
+SPARK_TGZ_ASC_URL=https://downloads.apache.org/spark/spark-{{ 
SPARK_VERSION }}/spark-{{ SPARK_VERSION }}-bin-hadoop3.tgz.asc \
+GPG_KEY=E298A3A825C0D65DFD57CBB651716619E084DAB9
+
+RUN set -ex; \
+export SPARK_TMP="$(mktemp -d)"; \
+cd $SPARK_TMP; \
+wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \
+wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \
+export GNUPGHOME="$(mktemp -d)"; \
+gpg --keyserver hkps://keyserver.pgp.com --recv-key "$GPG_KEY" || \
+gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY" || \
+gpg --batch --verify spark.tgz.asc spark.tgz

[spark-docker] branch master updated: [SPARK-40805] Use `spark` username in official image

2022-10-17 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new a75ecb1  [SPARK-40805] Use `spark` username in official image
a75ecb1 is described below

commit a75ecb13dee5580a149f2ef0bd9f8a4371d3d956
Author: Yikun Jiang 
AuthorDate: Mon Oct 17 16:09:30 2022 +0800

[SPARK-40805] Use `spark` username in official image

### What changes were proposed in this pull request?
This patch:
- Add spark uid/gid in dockerfile (useradd and groupadd). (used in 
entrypoint) This way is also used by [others 
DOI](https://github.com/search?p=2=org%3Adocker-library+useradd=Code) 
and apache DOI (such as 
[zookeeper](https://github.com/31z4/zookeeper-docker/blob/master/3.8.0/Dockerfile#L17-L21),
 
[solr](https://github.com/apache/solr-docker/blob/a20477ed123cd1a72132aebcc0742cee46b5f976/9.0/Dockerfile#L108-L110),
 [flink](https://github.com/apache/flink-docker/blob/master/1.15/sc [...]
- Use `spark` user in `entrypoint.sh` rather than Dockerfile. (make sure 
the spark process is executed as non-root users)
- Remove `USER` setting in Dockerfile. (make sure base image has permission 
to extend dockerifle, such as execute `apt update`)
- Chown script to `spark:spark` instead of `root:root`. (avoid permission 
issue such like standalone mode)
- Add `gosu` deps, a `sudo` replacement recommanded by 
[docker](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user)
 and [docker official 
image](https://github.com/docker-library/official-images/blob/9a4d54f1a42ea82970baa4e6f3d0bc75e98fc961/README.md#consistency),
 and also are used by other DOI images.

This change also follow the rules of docker official images, see also 
[consistency](https://github.com/docker-library/official-images/blob/9a4d54f1a42ea82970baa4e6f3d0bc75e98fc961/README.md#consistency)
 and [dockerfile best practices about 
user](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user).

### Why are the changes needed?

The below issues are what I have found so far

1. **Irregular login username**
  Docker images username is not very standard, docker run with `185` 
username is a little bit weird.

  ```
  $ docker run -ti apache/spark bash
  185d88a24357413:/opt/spark/work-dir$
  ```

2. **Permission issue of spark sbin**
And also there are some permission issue when running some spark script, 
such as standalone mode:

  ```
  $ docker run -ti apache/spark /opt/spark/sbin/start-master.sh

  mkdir: cannot create directory ‘/opt/spark/logs’: Permission denied
  chown: cannot access '/opt/spark/logs': No such file or directory
  starting org.apache.spark.deploy.master.Master, logging to 
/opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out
  /opt/spark/sbin/spark-daemon.sh: line 135: 
/opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out:
 No such file or directory
  failed to launch: nice -n 0 /opt/spark/bin/spark-class 
org.apache.spark.deploy.master.Master --host 1c345a00e312 --port 7077 
--webui-port 8080
  tail: cannot open 
'/opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out'
 for reading: No such file or directory
  full log in 
/opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out
  ```

  

3. **spark as base image case is not supported well**
  Due to static USER set in Dockerfile.
  ```
  $ cat Dockerfile
  FROM apache/spark
  RUN apt update

  $  docker build -t spark-test:1015 .
  // ...
  --
   > [2/2] RUN apt update:
  #5 0.405 E: Could not open lock file /var/lib/apt/lists/lock - open (13: 
Permission denied)
  #5 0.405 E: Unable to lock directory /var/lib/apt/lists/
  --
  executor failed running [/bin/sh -c apt update]: exit code: 100

  ```

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
- CI passed: all k8s test

- Regression test:
```
# Username is set to spark rather than 185
docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash
spark27bbfca0a581:/opt/spark/work-dir$
```
```
# start-master.sh no permission issue
$ docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash

spark8d1118e26766:~/work-dir$ /opt/spark/sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to 
/opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-8d1118e26766.out
```
```
# Image as parent case
$ cat Dockerfile
FROM spark:scala2.12-java11-python3-r-ubuntu
RUN apt update
$ docker build -t spark-test:1

[spark-docker] branch master updated: [SPARK-40783][INFRA] Enable Spark on K8s integration test

2022-10-13 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 3037f75  [SPARK-40783][INFRA] Enable Spark on K8s integration test
3037f75 is described below

commit 3037f75a88ca7ea57746c7d1bf49c125a828f56e
Author: Yikun Jiang 
AuthorDate: Fri Oct 14 11:57:01 2022 +0800

[SPARK-40783][INFRA] Enable Spark on K8s integration test

### What changes were proposed in this pull request?
This patch enable the Spark on K8s integration test:

- **scala2.12-java11-python3-ubuntu**: Run scala / PySpark basic test
- **scala2.12-java11-ubuntu**: Run scala basic test
- **scala2.12-java11-r-ubuntu**: Run scala / SparkR basic test
- **scala2.12-java11-python3-r-ubuntu**: Run all K8s integration test

Currently, we use the local registry as a bridge between build and test:
https://user-images.githubusercontent.com/1736354/195758243-abfbea7f-05e9-4678-a3a5-cfd38cc1b8f5.png;>

- Build: generate the image and push to local registry
- Test: load to minikube docker, run K8s test using specific image

Due to the multi-platform images cannot be exported with the `docker` 
export type, the local registry (push) is used here rather than local build 
(load). Compare to `ghcr` it reduces the network transmition and permission 
required.

Also:
- Upgrade `setup-qemu-action` to v2
- Upgrade `setup-buildx-action` to v2
- Remove ununsed `Image digest` step

### Why are the changes needed?
To ensure the quality of official dockerfiles.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes #9 from Yikun/enable-k8s-it.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/main.yml | 142 -
 1 file changed, 129 insertions(+), 13 deletions(-)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 7972703..b47245b 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -41,6 +41,15 @@ on:
 jobs:
   main:
 runs-on: ubuntu-latest
+# Due to the multi-platform images cannot be exported with the `docker` 
export type,
+# https://github.com/docker/buildx/issues/59
+# So, the local registry (push) is used here rather than local build 
(load):
+# 
https://github.com/docker/build-push-action/blob/master/docs/advanced/local-registry.md
+services:
+  registry:
+image: registry:2
+ports:
+  - 5000:5000
 strategy:
   matrix:
 spark_version:
@@ -55,29 +64,26 @@ jobs:
 uses: actions/checkout@v2
 
   - name: Set up QEMU
-uses: docker/setup-qemu-action@v1
+uses: docker/setup-qemu-action@v2
 
   - name: Set up Docker Buildx
-uses: docker/setup-buildx-action@v1
-
-  - name: Login to GHCR
-uses: docker/login-action@v2
+uses: docker/setup-buildx-action@v2
 with:
-  registry: ghcr.io
-  username: ${{ github.actor }}
-  password: ${{ secrets.GITHUB_TOKEN }}
+  # This required by local registry
+  driver-opts: network=host
 
   - name: Generate tags
 run: |
   TAG=scala${{ matrix.scala_version }}-java${{ matrix.java_version 
}}-${{ matrix.image_suffix }}
 
   REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr '[:upper:]' 
'[:lower:]')
-  TEST_REPO=ghcr.io/$REPO_OWNER/spark-docker
+  TEST_REPO=localhost:5000/$REPO_OWNER/spark-docker
   IMAGE_NAME=spark
   IMAGE_PATH=${{ matrix.spark_version }}/$TAG
   UNIQUE_IMAGE_TAG=${{ matrix.spark_version }}-$TAG
+  IMAGE_URL=$TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG
 
-  # Unique image tag in each version: scala2.12-java11-python3-ubuntu
+  # Unique image tag in each version: 
3.3.0-scala2.12-java11-python3-ubuntu
   echo "UNIQUE_IMAGE_TAG=${UNIQUE_IMAGE_TAG}" >> $GITHUB_ENV
   # Test repo: ghcr.io/apache/spark-docker
   echo "TEST_REPO=${TEST_REPO}" >> $GITHUB_ENV
@@ -85,6 +91,8 @@ jobs:
   echo "IMAGE_NAME=${IMAGE_NAME}" >> $GITHUB_ENV
   # Image dockerfile path: 3.3.0/scala2.12-java11-python3-ubuntu
   echo "IMAGE_PATH=${IMAGE_PATH}" >> $GITHUB_ENV
+  # Image URL: 
ghcr.io/apache/spark-docker/spark:3.3.0-scala2.12-java11-python3-ubuntu
+  echo "IMAGE_URL=${IMAGE_URL}" >> $GITHUB_ENV
 
   - name: Print Image tags
 run: |
@@ -92,13 +100,121 @@ jobs:
   echo "TEST_REPO: "${TEST_REPO}
   echo "IMAGE_NAME: "${IMAGE_NAME}
   echo "IMAGE_PATH: "${IMAGE_PATH}
+

[spark-docker] branch master updated: [SPARK-40754][DOCS] Add LICENSE and NOTICE

2022-10-13 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new fc07aed  [SPARK-40754][DOCS] Add LICENSE and NOTICE
fc07aed is described below

commit fc07aeda1f48eb2aae9a441dfe94ae95f697e222
Author: Yikun Jiang 
AuthorDate: Thu Oct 13 21:47:15 2022 +0800

[SPARK-40754][DOCS] Add LICENSE and NOTICE

### What changes were proposed in this pull request?
This pach adds LICENSE and NOTICE:
- LICENSE: https://www.apache.org/licenses/LICENSE-2.0.txt
- NOTICE: https://github.com/apache/spark/blob/master/NOTICE

### Why are the changes needed?
https://www.apache.org/licenses/LICENSE-2.0#apply

See also: 
https://github.com/apache/spark-docker/pull/2#issuecomment-1274807917

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
No need

Closes #6 from Yikun/SPARK-40754.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 LICENSE | 202 
 NOTICE  |   6 ++
 2 files changed, 208 insertions(+)

diff --git a/LICENSE b/LICENSE
new file mode 100644
index 000..d645695
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,202 @@
+
+ Apache License
+   Version 2.0, January 2004
+http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+  "License" shall mean the terms and conditions for use, reproduction,
+  and distribution as defined by Sections 1 through 9 of this document.
+
+  "Licensor" shall mean the copyright owner or entity authorized by
+  the copyright owner that is granting the License.
+
+  "Legal Entity" shall mean the union of the acting entity and all
+  other entities that control, are controlled by, or are under common
+  control with that entity. For the purposes of this definition,
+  "control" means (i) the power, direct or indirect, to cause the
+  direction or management of such entity, whether by contract or
+  otherwise, or (ii) ownership of fifty percent (50%) or more of the
+  outstanding shares, or (iii) beneficial ownership of such entity.
+
+  "You" (or "Your") shall mean an individual or Legal Entity
+  exercising permissions granted by this License.
+
+  "Source" form shall mean the preferred form for making modifications,
+  including but not limited to software source code, documentation
+  source, and configuration files.
+
+  "Object" form shall mean any form resulting from mechanical
+  transformation or translation of a Source form, including but
+  not limited to compiled object code, generated documentation,
+  and conversions to other media types.
+
+  "Work" shall mean the work of authorship, whether in Source or
+  Object form, made available under the License, as indicated by a
+  copyright notice that is included in or attached to the work
+  (an example is provided in the Appendix below).
+
+  "Derivative Works" shall mean any work, whether in Source or Object
+  form, that is based on (or derived from) the Work and for which the
+  editorial revisions, annotations, elaborations, or other modifications
+  represent, as a whole, an original work of authorship. For the purposes
+  of this License, Derivative Works shall not include works that remain
+  separable from, or merely link (or bind by name) to the interfaces of,
+  the Work and Derivative Works thereof.
+
+  "Contribution" shall mean any work of authorship, including
+  the original version of the Work and any modifications or additions
+  to that Work or Derivative Works thereof, that is intentionally
+  submitted to Licensor for inclusion in the Work by the copyright owner
+  or by an individual or Legal Entity authorized to submit on behalf of
+  the copyright owner. For the purposes of this definition, "submitted"
+  means any form of electronic, verbal, or written communication sent
+  to the Licensor or its representatives, including but not limited to
+  communication on electronic mailing lists, source code control systems,
+  and issue tracking systems that are managed by, or on behalf of, the
+  Licensor for the purpose of discussing and improving the Work, but
+  excluding communication that is conspicuously marked or otherwise
+  designated in writing by the copyright owner as "Not a Contribution."
+
+  "Contributor" shall mean Licensor and any individual or Legal Entity
+  on b

[spark-docker] branch master updated: [SPARK-40746][INFRA] Fix Dockerfile build workflow

2022-10-11 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new c116698  [SPARK-40746][INFRA] Fix Dockerfile build workflow
c116698 is described below

commit c11669850c0c03212df6d5c84c01050e6c933076
Author: Yikun Jiang 
AuthorDate: Wed Oct 12 10:48:51 2022 +0800

[SPARK-40746][INFRA] Fix Dockerfile build workflow

### What changes were proposed in this pull request?
This patch is to make the workflow work in apache repo:
- Add `.github/workflows/build_3.3.0.yaml` and `3.3.0/**` to trigger paths
- Change `apache/spark-docker:TAG` to 
`ghcr.io/apache/spark-docker/spark:TAG`
- Remove the push, we only need to build locally to validate dockerfile, 
even in future K8s IT test we can also refactor to use minikube docker, it 
still can be local build.

### Why are the changes needed?
To make the workflow works well in apache repo.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes: https://github.com/apache/spark-docker/pull/5

Closes #7 from Yikun/SPARK-40746.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/build_3.3.0.yaml | 3 ++-
 .github/workflows/main.yml | 3 +--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/build_3.3.0.yaml 
b/.github/workflows/build_3.3.0.yaml
index 63b1ab3..7e7ce39 100644
--- a/.github/workflows/build_3.3.0.yaml
+++ b/.github/workflows/build_3.3.0.yaml
@@ -24,7 +24,8 @@ on:
 branches:
   - 'master'
 paths:
-  - '3.3.0/'
+  - '3.3.0/**'
+  - '.github/workflows/build_3.3.0.yaml'
   - '.github/workflows/main.yml'
 
 jobs:
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 90bd706..7972703 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -97,8 +97,7 @@ jobs:
 uses: docker/build-push-action@v2
 with:
   context: ${{ env.IMAGE_PATH }}
-  push: true
-  tags: ${{ env.TEST_REPO }}:${{ env.UNIQUE_IMAGE_TAG }}
+  tags: ${{ env.TEST_REPO }}/${{ env.IMAGE_NAME }}:${{ 
env.UNIQUE_IMAGE_TAG }}
   platforms: linux/amd64,linux/arm64
 
   - name: Image digest


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-40757][INFRA] Add PULL_REQUEST_TEMPLATE for spark-docker

2022-10-11 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 30fd82f  [SPARK-40757][INFRA] Add PULL_REQUEST_TEMPLATE for 
spark-docker
30fd82f is described below

commit 30fd82f313c4ecd44f4181e6a4cf2e1d9463c628
Author: Yikun Jiang 
AuthorDate: Wed Oct 12 10:47:31 2022 +0800

[SPARK-40757][INFRA] Add PULL_REQUEST_TEMPLATE for spark-docker

### What changes were proposed in this pull request?
Initialize with 
https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE and 
remove some unsued note

### Why are the changes needed?
Add PULL_REQUEST_TEMPLATE for `spark-docker`

### Does this PR introduce _any_ user-facing change?
No, dev only

### How was this patch tested?
New PR after this merged

Closes #8 from Yikun/SPARK-40757.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/PULL_REQUEST_TEMPLATE | 41 +
 1 file changed, 41 insertions(+)

diff --git a/.github/PULL_REQUEST_TEMPLATE b/.github/PULL_REQUEST_TEMPLATE
new file mode 100644
index 000..5268131
--- /dev/null
+++ b/.github/PULL_REQUEST_TEMPLATE
@@ -0,0 +1,41 @@
+
+
+### What changes were proposed in this pull request?
+
+
+
+### Why are the changes needed?
+
+
+
+### Does this PR introduce _any_ user-facing change?
+
+
+
+### How was this patch tested?
+


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-40516] Add Apache Spark 3.3.0 Dockerfile

2022-10-10 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new e61aba1  [SPARK-40516] Add Apache Spark 3.3.0 Dockerfile
e61aba1 is described below

commit e61aba1ed4ca8e747f38cae5f6bd72a3a50f57cd
Author: Yikun Jiang 
AuthorDate: Tue Oct 11 10:45:57 2022 +0800

[SPARK-40516] Add Apache Spark 3.3.0 Dockerfile

### What changes were proposed in this pull request?
This patch adds Apache Spark 3.3.0 Dockerfile:
- 3.3.0-scala2.12-java11-python3-ubuntu: pyspark + scala
- 3.3.0-scala2.12-java11-ubuntu: scala
- 3.3.0-scala2.12-java11-r-ubuntu: sparkr + scala
- 3.3.0-scala2.12-java11-python3-r-ubuntu: All in one image

### Why are the changes needed?
This is needed by Docker Official Image

See also in: 
https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
**The action won't be triggered until the workflow is merged to the default 
branch**, so I can only test it in my local repo:

- local test: https://github.com/Yikun/spark-docker/pull/1

![image](https://user-images.githubusercontent.com/1736354/194975185-d5843c84-bbba-48d0-bbf0-363532c6712d.png)
- Dockerfile E2E K8S Local test: 
https://github.com/Yikun/spark-docker-bak/pull/7

![image](https://user-images.githubusercontent.com/1736354/194975267-6dca0de5-c715-4e0f-b735-22752b7912de.png)

Closes #2 from Yikun/SPARK-40516.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/build_3.3.0.yaml |  38 
 .github/workflows/main.yml | 105 
 3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile |  84 
 .../entrypoint.sh  | 107 +
 3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile   |  81 
 .../scala2.12-java11-python3-ubuntu/entrypoint.sh  | 107 +
 3.3.0/scala2.12-java11-r-ubuntu/Dockerfile |  79 +++
 3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh  | 107 +
 3.3.0/scala2.12-java11-ubuntu/Dockerfile   |  76 +++
 3.3.0/scala2.12-java11-ubuntu/entrypoint.sh| 107 +
 10 files changed, 891 insertions(+)

diff --git a/.github/workflows/build_3.3.0.yaml 
b/.github/workflows/build_3.3.0.yaml
new file mode 100644
index 000..63b1ab3
--- /dev/null
+++ b/.github/workflows/build_3.3.0.yaml
@@ -0,0 +1,38 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Build and Test (3.3.0)"
+
+on:
+  pull_request:
+branches:
+  - 'master'
+paths:
+  - '3.3.0/'
+  - '.github/workflows/main.yml'
+
+jobs:
+  run-build:
+name: Run
+secrets: inherit
+uses: ./.github/workflows/main.yml
+with:
+  spark: 3.3.0
+  scala: 2.12
+  java: 11
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
new file mode 100644
index 000..90bd706
--- /dev/null
+++ b/.github/workflows/main.yml
@@ -0,0 +1,105 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: Main (Build/Test/Publish)
+
+on:
+  workflow_cal

[spark-docker] branch master updated: [SPARK-40727][INFRA] Add merge_spark_docker_pr.py

2022-10-10 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new fa2d1a5  [SPARK-40727][INFRA] Add merge_spark_docker_pr.py
fa2d1a5 is described below

commit fa2d1a59b6e47b1e4072154de0b1f215780af595
Author: Yikun Jiang 
AuthorDate: Mon Oct 10 20:34:43 2022 +0800

[SPARK-40727][INFRA] Add merge_spark_docker_pr.py

### What changes were proposed in this pull request?
This patch add the merge_spark_docker_pr.py to help to merge `spark-docker` 
commits and resolve spark JIRA issue.

The script is from 
https://github.com/apache/spark/blob/ef837ca71020950b841f9891c70dc4b29d968bf1/dev/merge_spark_pr.py

And change `spark` to `spark-docker`: 
https://github.com/apache/spark-docker/commit/e4107a74d348656041612ff68a647c6051894240

### Why are the changes needed?
Help to merge spark-docker commits.

### Does this PR introduce _any_ user-facing change?
No, dev only

### How was this patch tested?
will merge it by using itself

Closes #1 from Yikun/merge_script.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 merge_spark_docker_pr.py | 571 +++
 1 file changed, 571 insertions(+)

diff --git a/merge_spark_docker_pr.py b/merge_spark_docker_pr.py
new file mode 100755
index 000..578a280
--- /dev/null
+++ b/merge_spark_docker_pr.py
@@ -0,0 +1,571 @@
+#!/usr/bin/env python3
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Utility for creating well-formed pull request merges and pushing them to 
Apache
+# Spark.
+#   usage: ./merge_spark_docker_pr.py(see config env vars below)
+#
+# This utility assumes you already have a local Spark git folder and that you
+# have added remotes corresponding to both (i) the github apache Spark
+# mirror and (ii) the apache git repo.
+
+import json
+import os
+import re
+import subprocess
+import sys
+import traceback
+from urllib.request import urlopen
+from urllib.request import Request
+from urllib.error import HTTPError
+
+try:
+import jira.client
+
+JIRA_IMPORTED = True
+except ImportError:
+JIRA_IMPORTED = False
+
+# Location of your Spark git development area
+SPARK_DOCKER_HOME = os.environ.get("SPARK_DOCKER_HOME", os.getcwd())
+# Remote name which points to the Gihub site
+PR_REMOTE_NAME = os.environ.get("PR_REMOTE_NAME", "apache-github")
+# Remote name which points to Apache git
+PUSH_REMOTE_NAME = os.environ.get("PUSH_REMOTE_NAME", "apache")
+# ASF JIRA username
+JIRA_USERNAME = os.environ.get("JIRA_USERNAME", "")
+# ASF JIRA password
+JIRA_PASSWORD = os.environ.get("JIRA_PASSWORD", "")
+# OAuth key used for issuing requests against the GitHub API. If this is not 
defined, then requests
+# will be unauthenticated. You should only need to configure this if you find 
yourself regularly
+# exceeding your IP's unauthenticated request rate limit. You can create an 
OAuth key at
+# https://github.com/settings/tokens. This script only requires the 
"public_repo" scope.
+GITHUB_OAUTH_KEY = os.environ.get("GITHUB_OAUTH_KEY")
+
+
+GITHUB_BASE = "https://github.com/apache/spark-docker/pull;
+GITHUB_API_BASE = "https://api.github.com/repos/apache/spark-docker;
+JIRA_BASE = "https://issues.apache.org/jira/browse;
+JIRA_API_BASE = "https://issues.apache.org/jira;
+# Prefix added to temporary branches
+BRANCH_PREFIX = "PR_TOOL"
+
+
+def get_json(url):
+try:
+request = Request(url)
+if GITHUB_OAUTH_KEY:
+request.add_header("Authorization", "token %s" % GITHUB_OAUTH_KEY)
+return json.load(urlopen(request))
+except HTTPError as e:
+if "X-RateLimit-Remaining" in e.headers and 
e.headers["X-RateLimit-Remaining"] == "0":
+print(
+"Exceeded the GitHub API rate limit; see the instructions in "
+   

[spark] branch master updated: [SPARK-40725][INFRA] Add `mypy-protobuf` to dev/requirements

2022-10-10 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3fa958af326 [SPARK-40725][INFRA] Add `mypy-protobuf` to 
dev/requirements
3fa958af326 is described below

commit 3fa958af326582d8638f36f90b91fe7045f396bf
Author: Ruifeng Zheng 
AuthorDate: Mon Oct 10 17:30:12 2022 +0800

[SPARK-40725][INFRA] Add `mypy-protobuf` to dev/requirements

### What changes were proposed in this pull request?
Add `mypy-protobuf` to dev/requirements

### Why are the changes needed?
`connector/connect/dev/generate_protos.sh` requires this package:
```
DEBUG   /buf.alpha.registry.v1alpha1.GenerateService/GeneratePlugins
{"duration": "14.25µs", "http.path": 
"/buf.alpha.registry.v1alpha1.GenerateService/GeneratePlugins", "http.url": 
"https://api.buf.build/buf.alpha.registry.v1alpha1.GenerateService/GeneratePlugins;,
 "http.host": "api.buf.build", "http.method": "POST", "http.user_agent": 
"connect-go/0.4.0-dev (go1.19.2)"}
DEBUG   command {"duration": "9.238333ms"}
Failure: plugin mypy: could not find protoc plugin for name mypy
```

### Does this PR introduce _any_ user-facing change?
No, only for contributors

### How was this patch tested?
manually check
    
Closes #38186 from zhengruifeng/add_mypy-protobuf_to_requirements.

Authored-by: Ruifeng Zheng 
Signed-off-by: Yikun Jiang 
---
 dev/requirements.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/dev/requirements.txt b/dev/requirements.txt
index c610d84c11a..4b47c1f6e83 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -48,4 +48,5 @@ black==22.6.0
 
 # Spark Connect
 grpcio==1.48.1
-protobuf==4.21.6
\ No newline at end of file
+protobuf==4.21.6
+mypy-protobuf


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] 01/01: [SPARK-40723][INFRA] Add .asf.yaml to spark-docker

2022-10-10 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git

commit c5b015ac2014bfeb47daafd454b610ae9633f676
Author: Yikun Jiang 
AuthorDate: Mon Oct 10 16:40:34 2022 +0800

[SPARK-40723][INFRA] Add .asf.yaml to spark-docker

### What changes were proposed in this pull request?
This change add the .asf.yaml as first commit.

### Why are the changes needed?
Initialize the repo setting.

### Does this PR introduce _any_ user-facing change?
No, dev only

### How was this patch tested?
See result after merged

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .asf.yaml | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/.asf.yaml b/.asf.yaml
new file mode 100644
index 000..cc7385f
--- /dev/null
+++ b/.asf.yaml
@@ -0,0 +1,39 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features
+---
+github: 
+  description: "Official Dockerfile for Apache Spark"
+  homepage: https://spark.apache.org/
+  labels: 
+- python
+- scala
+- r
+- java
+- big-data
+- jdbc
+- sql
+- spark
+  enabled_merge_buttons: 
+merge: false
+squash: true
+rebase: true
+
+notifications: 
+  pullrequests: revi...@spark.apache.org
+  issues: revi...@spark.apache.org
+  commits: commits@spark.apache.org
+


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master created (now c5b015a)

2022-10-10 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


  at c5b015a  [SPARK-40723][INFRA] Add .asf.yaml to spark-docker

This branch includes the following new commits:

 new c5b015a  [SPARK-40723][INFRA] Add .asf.yaml to spark-docker

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org