Hello,

I am trying to deploy a Spark streaming application using the Spark
Kubernetes Operator, but the application crashes after a while.

After describing CRD using *kubectl -n my-namespace describe
sparkapplication my-app,* I see the following -

        Qos Class:           Guaranteed
        Start Time:          2025-04-29T12:18:15Z
      Last Transition Time:  2025-04-29T12:24:37.365547649Z
      Message:               *The Spark application failed to get enough
executors in the given time threshold.*

My deployment spec looks something like this :

apiVersion: spark.apache.org/v1alpha1
kind: SparkApplication
metadata:
name: my-app-test
labels:
app: my-app
annotations:
owner: my-team
spec:
runtimeVersions:
scalaVersion: "2.13"
sparkVersion: "3.5.0"
mainClass: "com.myorg.MyStreamingApp"
jars: "local:///opt/spark/app/my-app-jar-2.2.7.jar"
sparkConf:
spark.kubernetes.authenticate.driver.serviceAccountName: "
some-service-account"
spark.executor.instances: "2"
spark.executor.memory: "1g"
spark.executor.cores: "1"
spark.hadoop.fs.s3a.access.key: *****
spark.hadoop.fs.s3a.aws.credentials.provider:
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
spark.hadoop.fs.s3a.endpoint: *****
spark.hadoop.fs.s3a.fast.upload: "true"
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.path.style.access: "true"
spark.hadoop.fs.s3a.secret.key: *****
spark.kerberos.keytab: local:///mnt/keytabs/some-principal.keytab
spark.kerberos.principal: *****
spark.kubernetes.driverEnv.KRB5_CONFIG: /mnt/krb5/krb5.conf
spark.kubernetes.executorEnv.KRB5_CONFIG: /mnt/krb5/krb5.conf
spark.kubernetes.file.upload.path: s3a://enrichment/tmp
spark.kubernetes.kerberos.krb5.configMapName: krb5-config-map
spark.sql.streaming.metricsEnabled: "true"
spark.kubernetes.container.image: "my-image"
applicationTolerations:
instanceConfig:
minExecutors: 2
initExecutors: 1
maxExecutors: 4
driverSpec:
podTemplateSpec:
spec:
imagePullSecrets:
- name: some-nexus
containers:
- name: spark-driver
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: "1000m"
...
volumeMounts:
- name: app-jar
mountPath: "/opt/spark/app"
- name: keytabs
env:
- name: SECRETS_ROOT_DIR
value: /mnt/secrets
initContainers:
- name: download-jar
image: "some-image"
volumeMounts:
- name: app-jar
mountPath: "/opt/spark/app"
env:
- name: NEXUS_USERNAME
value: *****
...
command: ["sh", "-c"]
args:
- "curl -u <rest_of_command>"
volumes:
- name: app-jar
emptyDir: {}
- name: keytabs
secret:
secretName: keytabs
metadata:
labels:
version: "3.5.0"
executorSpec:
podTemplateSpec:
spec:
imagePullSecrets:
- name: some-nexus
containers:
- name: spark-executor
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: "1000m"
...
volumeMounts:
- name: app-jar
mountPath: "/opt/spark/app"
- name: keytabs
mountPath: "/mnt/keytabs"
env:
- name: SECRETS_ROOT_DIR
value: /mnt/secrets
initContainers:
- name: download-jar
image: "some-image"
volumeMounts:
- name: app-jar
mountPath: "/opt/spark/app"
env:
- name: NEXUS_USERNAME
value: *****
...
command: ["sh", "-c"]
args:
- "curl -u <rest_of_command>"
volumes:
- name: app-jar
emptyDir: {}
- name: keytabs
secret:
secretName: keytabs
...
metadata:
labels:
version: "3.5.0"

Has anyone faced this issue? Would appreciate any help on this matter.


Thanks,
Nilanjan Sarkar

Reply via email to