I have been struggling with this.

Kubernetes (not that matters minikube is working fine. In one of the module
called configure.py  I am importing yaml module


import yaml


This is throwing errors


    import yaml
ModuleNotFoundError: No module named 'yaml'


I have been through a number of loops.


First I created  virtual environment pyspark_venv.tar.gz that includes yaml
module and past it to spark-submit as follows


+ spark-submit --verbose --master k8s://192.168.49.2:8443
'--archives=hdfs://
50.140.197.220:9000/minikube/codes/pyspark_venv.tar.gz#pyspark_venv'
--deploy-mode cluster --name pytest --conf
'spark.kubernetes.namespace=spark' --conf 'spark.executor.instances=1'
--conf 'spark.kubernetes.driver.limit.cores=1' --conf
'spark.executor.cores=1' --conf 'spark.executor.memory=500m' --conf
'spark.kubernetes.container.image=pytest-repo/spark-py:3.1.1' --conf
'spark.kubernetes.authenticate.driver.serviceAccountName=spark-serviceaccount'
--py-files hdfs://50.140.197.220:9000/minikube/codes/DSBQ.zip hdfs://
50.140.197.220:9000/minikube/codes/testyml.py


Parsed arguments:
  master                  k8s://192.168.49.2:8443
  deployMode              cluster
  executorMemory          500m
  executorCores           1
  totalExecutorCores      null
  propertiesFile          /opt/spark/conf/spark-defaults.conf
  driverMemory            null
  driverCores             null
  driverExtraClassPath    $SPARK_HOME/jars/*.jar
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            1
  files                   null
  pyFiles                 hdfs://50.140.197.220:9000/minikube/codes/DSBQ.zip
  archives                hdfs://
50.140.197.220:9000/minikube/codes/pyspark_venv.tar.gz#pyspark_venv
  mainClass               null
  primaryResource         hdfs://
50.140.197.220:9000/minikube/codes/testyml.py
  name                    pytest
  childArgs               []
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true


Unpacking an archive hdfs://
50.140.197.220:9000/minikube/codes/pyspark_venv.tar.gz#pyspark_venv from
/tmp/spark-d339a76e-090c-4670-89aa-da723d6e9fbc/pyspark_venv.tar.gz to
/opt/spark/work-dir/./pyspark_venv


printing sys.path
/tmp/spark-20050bca-7eb2-4b06-9bc3-42dce97118fc
/tmp/spark-20050bca-7eb2-4b06-9bc3-42dce97118fc/DSBQ.zip
/opt/spark/python/lib/pyspark.zip
/opt/spark/python/lib/py4j-0.10.9-src.zip
/opt/spark/jars/spark-core_2.12-3.1.1.jar
/usr/lib/python37.zip
/usr/lib/python3.7
/usr/lib/python3.7/lib-dynload
/usr/local/lib/python3.7/dist-packages
/usr/lib/python3/dist-packages

 Printing user_paths
['/tmp/spark-20050bca-7eb2-4b06-9bc3-42dce97118fc/DSBQ.zip',
'/opt/spark/python/lib/pyspark.zip',
'/opt/spark/python/lib/py4j-0.10.9-src.zip',
'/opt/spark/jars/spark-core_2.12-3.1.1.jar']
checking yaml
Traceback (most recent call last):
  File "/tmp/spark-20050bca-7eb2-4b06-9bc3-42dce97118fc/testyml.py", line
18, in <module>
    main()
  File "/tmp/spark-20050bca-7eb2-4b06-9bc3-42dce97118fc/testyml.py", line
15, in main
    import yaml
ModuleNotFoundError: No module named 'yaml'


Well it does not matter if it is yaml or numpy. It just cannot find the
modules. How can I find out if the gz file is unpacked OK?


Thanks


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Reply via email to