I have been struggling with this.
Kubernetes (not that matters minikube is working fine. In one of the module called configure.py I am importing yaml module import yaml This is throwing errors import yaml ModuleNotFoundError: No module named 'yaml' I have been through a number of loops. First I created virtual environment pyspark_venv.tar.gz that includes yaml module and past it to spark-submit as follows + spark-submit --verbose --master k8s://192.168.49.2:8443 '--archives=hdfs:// 50.140.197.220:9000/minikube/codes/pyspark_venv.tar.gz#pyspark_venv' --deploy-mode cluster --name pytest --conf 'spark.kubernetes.namespace=spark' --conf 'spark.executor.instances=1' --conf 'spark.kubernetes.driver.limit.cores=1' --conf 'spark.executor.cores=1' --conf 'spark.executor.memory=500m' --conf 'spark.kubernetes.container.image=pytest-repo/spark-py:3.1.1' --conf 'spark.kubernetes.authenticate.driver.serviceAccountName=spark-serviceaccount' --py-files hdfs://50.140.197.220:9000/minikube/codes/DSBQ.zip hdfs:// 50.140.197.220:9000/minikube/codes/testyml.py Parsed arguments: master k8s://192.168.49.2:8443 deployMode cluster executorMemory 500m executorCores 1 totalExecutorCores null propertiesFile /opt/spark/conf/spark-defaults.conf driverMemory null driverCores null driverExtraClassPath $SPARK_HOME/jars/*.jar driverExtraLibraryPath null driverExtraJavaOptions null supervise false queue null numExecutors 1 files null pyFiles hdfs://50.140.197.220:9000/minikube/codes/DSBQ.zip archives hdfs:// 50.140.197.220:9000/minikube/codes/pyspark_venv.tar.gz#pyspark_venv mainClass null primaryResource hdfs:// 50.140.197.220:9000/minikube/codes/testyml.py name pytest childArgs [] jars null packages null packagesExclusions null repositories null verbose true Unpacking an archive hdfs:// 50.140.197.220:9000/minikube/codes/pyspark_venv.tar.gz#pyspark_venv from /tmp/spark-d339a76e-090c-4670-89aa-da723d6e9fbc/pyspark_venv.tar.gz to /opt/spark/work-dir/./pyspark_venv printing sys.path /tmp/spark-20050bca-7eb2-4b06-9bc3-42dce97118fc /tmp/spark-20050bca-7eb2-4b06-9bc3-42dce97118fc/DSBQ.zip /opt/spark/python/lib/pyspark.zip /opt/spark/python/lib/py4j-0.10.9-src.zip /opt/spark/jars/spark-core_2.12-3.1.1.jar /usr/lib/python37.zip /usr/lib/python3.7 /usr/lib/python3.7/lib-dynload /usr/local/lib/python3.7/dist-packages /usr/lib/python3/dist-packages Printing user_paths ['/tmp/spark-20050bca-7eb2-4b06-9bc3-42dce97118fc/DSBQ.zip', '/opt/spark/python/lib/pyspark.zip', '/opt/spark/python/lib/py4j-0.10.9-src.zip', '/opt/spark/jars/spark-core_2.12-3.1.1.jar'] checking yaml Traceback (most recent call last): File "/tmp/spark-20050bca-7eb2-4b06-9bc3-42dce97118fc/testyml.py", line 18, in <module> main() File "/tmp/spark-20050bca-7eb2-4b06-9bc3-42dce97118fc/testyml.py", line 15, in main import yaml ModuleNotFoundError: No module named 'yaml' Well it does not matter if it is yaml or numpy. It just cannot find the modules. How can I find out if the gz file is unpacked OK? Thanks view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.