Hi,

Maybe someone can shed some light on this.


Running Pyspark job in minikube.


Because it is PySpark the following two conf parameters are used:


       spark-submit --verbose \

           --master k8s://$K8S_SERVER \

           --deploy-mode cluster \

           --name pytest \

           --py-files hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/DSBQ.zip \

           --archives
hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/${pyspark_venv}.tar.gz#${pyspark_venv}
\

The first file --py-files send the zipped PySpark project


The second one --archives is used to send the package dependencies created
with conda


These are the output from spark


Parsed arguments:
  master                  k8s://192.168.49.2:8443
  deployMode              cluster
  executorMemory          5000m
  executorCores           1
  totalExecutorCores      null
  propertiesFile          /opt/spark/conf/spark-defaults.conf
  driverMemory            null
  driverCores             null
  driverExtraClassPath    $SPARK_HOME/jars/*.jar
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            2
  files                   null
  pyFiles                 hdfs://50.140.197.220:9000/minikube/codes/DSBQ.zip
  archives                hdfs://
50.140.197.220:9000/minikube/codes/pyspark_venv.tar.gz#pyspark_venv
  mainClass               null
  primaryResource         hdfs://
50.140.197.220:9000/minikube/codes/testpackages.py
  name                    pytest
  childArgs               []
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Trying to unpack that gz file


in the Python code I am trying to import pandas



This is what is happening from the pod logs:


Unpacking an archive hdfs://
50.140.197.220:9000/minikube/codes/pyspark_venv.tar.gz#pyspark_venv

from /tmp/spark-57c6ace6-c01f-420c-ab88-0cdb9015eb92/pyspark_venv.tar.gz

to /opt/spark/work-dir/./pyspark_venv

Exception in thread "main" ExitCodeException exitCode=2: tar:
lib/python3.7/site-packages/pandas/tests/util/__pycache__/

test_assert_categorical_equal.cpython-37.pyc:

Cannot open: Cannot allocate memory


However this works fine when I run the code in local mode as opposed to k8s!


thanks


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Reply via email to