Stanis Shkel created SPARK-26401: ------------------------------------ Summary: [k8s] Init container drops necessary config options for pulling jars from azure storage Key: SPARK-26401 URL: https://issues.apache.org/jira/browse/SPARK-26401 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 2.3.2 Reporter: Stanis Shkel
I am running spark-submit command that pulls a jar from a remote private azure storage account. As far as I understand jar is supposed to be pulled within init container of the driver. However, the container doesn't inherit "spark.hadoop.fs.azure.account.key.$(STORAGE_ACCT).blob.core.windows.net=$(STORAGE_SECRET)" parameter that I pass in when running spark submit. Here is what I found so far. spark-init container is called via the following command [https://github.com/apache/spark/blob/branch-2.3/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L83] Which in the end turns into the following shell call {code:bash} exec /usr/lib/jvm/java-1.8-openjdk/bin/java -cp '/opt/spark/conf/:/opt/spark/jars/*' -Xmx1g org.apache.spark.deploy.k8s.SparkPodInitContainer /etc/spark-init/spark-init.properties {code} If I cat out spark-init properties the only parameters that are in there are spark.kubernetes.mountDependencies.jarsDownloadDir=/var/spark-data/spark-jars spark.kubernetes.initContainer.remoteJars=wasbs\://mycontai...@testaccount.blob.core.windows.net/jars/myjar.jar,wasbs\://mycontai...@testaccount.blob.core.windows.net/jars/myjar.jar spark.kubernetes.mountDependencies.filesDownloadDir=/var/spark-data/spark-files My guess it's these params [https://github.com/apache/spark/blob/branch-2.3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/initcontainer/BasicInitContainerConfigurationStep.scala#L49] However the spark.hadoop.fs.azure.account.key is not present in that file nor in the environment. This causes download of the jar fail, the exception is as follows {code:bash} Exception in thread "main" org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.AzureException: Container mycontainer in account testaccount.blob.core.windows.net not found, and we can't create it using anoynomous credentials. at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:938) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:438) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1048) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1910) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:700) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:492) at org.apache.spark.deploy.k8s.FileFetcher.fetchFile(SparkPodInitContainer.scala:91) at org.apache.spark.deploy.k8s.SparkPodInitContainer$$anonfun$downloadFiles$1$$anonfun$apply$2.apply(SparkPodInitContainer.scala:81) at org.apache.spark.deploy.k8s.SparkPodInitContainer$$anonfun$downloadFiles$1$$anonfun$apply$2.apply(SparkPodInitContainer.scala:79) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at org.apache.spark.deploy.k8s.SparkPodInitContainer$$anonfun$downloadFiles$1.apply(SparkPodInitContainer.scala:79) at org.apache.spark.deploy.k8s.SparkPodInitContainer$$anonfun$downloadFiles$1.apply(SparkPodInitContainer.scala:77) at scala.Option.foreach(Option.scala:257) at org.apache.spark.deploy.k8s.SparkPodInitContainer.downloadFiles(SparkPodInitContainer.scala:77) at org.apache.spark.deploy.k8s.SparkPodInitContainer.run(SparkPodInitContainer.scala:56) at org.apache.spark.deploy.k8s.SparkPodInitContainer$.main(SparkPodInitContainer.scala:113) at org.apache.spark.deploy.k8s.SparkPodInitContainer.main(SparkPodInitContainer.scala) Caused by: org.apache.hadoop.fs.azure.AzureException: Container qrefinery in account jr3e3d.blob.core.windows.net not found, and we can't create it using anoynomous credentials. at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:730) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:933) ... 22 more {code} I am certain that the parameter is being passed in to the driver correctly. Due to https://issues.apache.org/jira/browse/SPARK-26400 spark-init container "succeeds" and the driver will fail with missing jar step. I can see -Dspark.hadoop.fs.azure.account.key as one of the flags for the driver CMD. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org