Hi, I have an issue with my PySpark running in Kubernetes (testing on minikube).
The project is zipped as DSBQ.zip and passed to spark-submit with the zipped file on HDFS (pod can read it). Zipped file DSBQ.zip zipped at root and has the following structure: One of the py-files called DSBQ.zip is the root zip file for the application project ls DSBQ __init__.py assembly conf data deployment lib linux othermisc sparkutils src tests under folder conf I have a file called config.yml that is read in src/configure.py as follows import yaml import sys import os with open("/home/hduser/dba/bin/python/DSBQ/conf/config.yml", 'r') as file: config: dict = yaml.safe_load(file) That absolute path --> /home/hduser/dba/bin/python/DSBQ/conf/config.yml is not recognised in pod This path with open("DSBQ/conf/config.yml", 'r') as file: is not recognised either PYSpark can read from HDFS so this is the code used spark-submit --verbose \ --master k8s://$K8S_SERVER \ --deploy-mode cluster \ --name pytest \ --py-files hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/DSBQ.zip,hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/dependencies_short.zip \ --conf spark.kubernetes.namespace=spark \ --conf spark.executor.instances=1 \ --conf spark.kubernetes.driver.limit.cores=1 \ --conf spark.executor.cores=1 \ --conf spark.executor.memory=500m \ --conf spark.kubernetes.container.image=${IMAGE} \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-serviceaccount \ --conf spark.kubernetes.file.upload.path=$SOURCE_DIR \ --conf spark.kubernetes.driver.volumes.$VOLUME_TYPE.$VOLUME_NAME.mount.path=$MOUNT_PATH \ --conf spark.kubernetes.driver.volumes.$VOLUME_TYPE.$VOLUME_NAME.options.path=$MOUNT_PATH \ --conf spark.kubernetes.executor.volumes.$VOLUME_TYPE.$VOLUME_NAME.mount.path=$MOUNT_PATH \ --conf spark.kubernetes.executor.volumes.$VOLUME_TYPE.$VOLUME_NAME.options.path=$MOUNT_PATH \ hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/${APPLICATION} I have not managed to read the yaml file from external mounts or other methods. The only way is to read it is through scwholeTextFiles from HDFS lines = sc.wholeTextFiles("hdfs://$HOST:$PORT/minikube/codes/config.yml") rdd = lines.map(lambda x: x[1]) l = rdd.collect() print(l) *l returns the value as a list* like sample below from yaml file ['common:\n appName: \'md\'\n newtopic: \'newtopic\'\nplot_fonts:\n font:\n \'family\': \'serif\'\n \'color\': \'darkred\'\n \'weight\': \'normal\'\n \'size\': 10\n\n # define font dictionary\n font_small:\n \'family\': \'serif\'\n \'color\': \'darkred\'\n \'weight\': \'normal\'\n \'size\': 7\n\n Which is these lines in yaml file common: appName: 'md' newtopic: 'newtopic' plot_fonts: font: 'family': 'serif' 'color': 'darkred' 'weight': 'normal' 'size': 10 # define font dictionary font_small: 'family': 'serif' 'color': 'darkred' 'weight': 'normal' 'size': 7 Now I need to read that list and create a dict out of it like config: dict = yaml.safe_load(l) File "/tmp/spark-34d56d02-ce8a-442f-9c84-f265f1c279e2/testpackages.py", line 71, in <module> main() File "/tmp/spark-34d56d02-ce8a-442f-9c84-f265f1c279e2/testpackages.py", line 42, in main config: dict = yaml.safe_load(l) File "/tmp/spark-34d56d02-ce8a-442f-9c84-f265f1c279e2/dependencies_short.zip/yaml/__init__.py", line 162, in safe_load File "/tmp/spark-34d56d02-ce8a-442f-9c84-f265f1c279e2/dependencies_short.zip/yaml/__init__.py", line 112, in load File "/tmp/spark-34d56d02-ce8a-442f-9c84-f265f1c279e2/dependencies_short.zip/yaml/loader.py", line 34, in __init__ File "/tmp/spark-34d56d02-ce8a-442f-9c84-f265f1c279e2/dependencies_short.zip/yaml/reader.py", line 85, in __init__ File "/tmp/spark-34d56d02-ce8a-442f-9c84-f265f1c279e2/dependencies_short.zip/yaml/reader.py", line 124, in determine_encoding File "/tmp/spark-34d56d02-ce8a-442f-9c84-f265f1c279e2/dependencies_short.zip/yaml/reader.py", line 178, in update_raw AttributeError: 'list' object has no attribute 'read' Which throws an error! I am sure there is a solution to read this yaml file inside pod? Thanks view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.