ninebigbig opened a new pull request, #39967:
URL: https://github.com/apache/spark/pull/39967
[SPARK-42395][K8S]The code logic of the configmap max size validation lacks
extra content
### What changes were proposed in this pull request?
This PR aims to add the extra content of the configmap into the configmap
max size validation
In each configmap, Spark adds an extra content in a fixed format,this extra
content of the configmap is as belows:
spark.kubernetes.namespace: default
spark.properties: |
#Java properties built from Kubernetes config map with name:
spark-exec-b47b438630eec12d-conf-map
#Wed Feb 08 20:10:19 CST 2023
spark.kubernetes.namespace=default
The example specific content comes from the command output:
# kubectl get cm spark-exec-b47b438630eec12d-conf-map -o yaml
apiVersion: v1
data:
spark.kubernetes.namespace: default
spark.properties: |
#Java properties built from Kubernetes config map with name:
spark-exec-b47b438630eec12d-conf-map
#Wed Feb 08 20:10:19 CST 2023
spark.kubernetes.namespace=default
immutable: true
kind: ConfigMap
metadata:
creationTimestamp: "2023-02-08T12:14:13Z"
labels:
spark-app-selector: spark-9c4a3cc1d752491cba899135d8fbf492
spark-role: executor
name: spark-exec-b47b438630eec12d-conf-map
namespace: default
resourceVersion: "62638833"
uid: 28f7bbd1-38aa-4e8e-b715-9cc8eb047074
### Why are the changes needed?
If no chanages,there may be a problem like jira SPARK-42344
### Does this PR introduce _any_ user-facing change?
The major of this pr is the same to the jira SPARK-42344.
And additional,this pr droped the duplicated content from the extra content
of the configmap.
before:
spark.kubernetes.namespace: default
spark.properties: |
#Java properties built from Kubernetes config map with name:
spark-exec-b47b438630eec12d-conf-map
#Wed Feb 08 20:10:19 CST 2023
spark.kubernetes.namespace=default
after:
spark.properties: |
#Java properties built from Kubernetes config map with name:
spark-exec-e33068863475c9ee-conf-map
#Thu Feb 09 12:36:20 CST 2023
spark.kubernetes.namespace=default
User can get the namespace from spark.properties.
But mostly,on the k8s cluster,user can get the namespace by environment
variable or downwardapi
### How was this patch tested?
Local test
TEST CASE 1:
1.mock a file named log4j.properties in spark home directory
2.the file name length is 16,and the extra content size is 163.
set the size of the log4j.properties to:
1048576 - 163 -16 = 1048397
3.truncate the file to 1048397
# truncate -s 1048397 log4j.properties
# ls -l
-rwxr-xr-x 1 root root 1048397 Feb 9 16:13 log4j.properties
4.test with the commond spark-submit with arg :
--conf
spark.home=/home/work/code/apache/spark-github/dist
5.Test expectations:
The configmap is not include the content log4j.properties.Because the size
is just equal to 1048576.
There is only one data in configmap spark-exec-d09ef186354c6ebe-conf-map
# kubectl get cm
NAME DATA AGE
ingress-traefik 1 203d
kube-root-ca.crt 1 227d
spark-exec-d09ef186354c6ebe-conf-map 1 33s
spark-standalone-exporter 1 203d
spark-standalone-graphite-mapping 1 203d
# kubectl get cm spark-exec-e33068863475c9ee-conf-map -o yaml
apiVersion: v1
data:
spark.properties: |
#Java properties built from Kubernetes config map with name:
spark-exec-e33068863475c9ee-conf-map
#Thu Feb 09 12:36:20 CST 2023
spark.kubernetes.namespace=default
immutable: true
kind: ConfigMap
metadata:
creationTimestamp: "2023-02-09T04:36:20Z"
labels:
spark-app-selector: spark-8e68465119bd411ba0b0cc797abe294c
spark-role: executor
name: spark-exec-e33068863475c9ee-conf-map
namespace: default
resourceVersion: "62889868"
uid: ba8d7f31-a4c5-42fd-9aba-a02b73eb37bc
TEST CASE 2:
1.the file name length is 16,and the extra content size is 163.
set the size of the log4j.properties to:
1048576 - 163 - 16 - 1= 1048396
2.truncate the file to 1048396
# truncate -s 1048396 log4j.properties
# ls -l
-rwxr-xr-x 1 root root 1048396 Feb 9 16:32 log4j.properties
3.test with the commond spark-submit with arg :
--conf
spark.home=/home/work/code/apache/spark-github/dist
4.Test expectations:
The configmap is include the content log4j.properties.
There is 2 datas in the configmap spark-exec-1d90d186354e63ee-conf-map
# kubectl get cm
NAME DATA AGE
ingress-traefik 1 203d
kube-root-ca.crt 1 227d
spark-exec-1d90d186354e63ee-conf-map 2 52s
spark-standalone-exporter 1 203d
spark-standalone-graphite-mapping 1 203d
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]