
Maybe someone can shed some light on this.

Running Pyspark job in minikube.

Because it is PySpark the following two conf parameters are used:

       spark-submit --verbose \

           --master k8s://$K8S_SERVER \

           --deploy-mode cluster \

           --name pytest \

           --py-files hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/DSBQ.zip \


The first file --py-files send the zipped PySpark project

The second one --archives is used to send the package dependencies created
with conda

These are the output from spark

Parsed arguments:
  master                  k8s://
  deployMode              cluster
  executorMemory          5000m
  executorCores           1
  totalExecutorCores      null
  propertiesFile          /opt/spark/conf/spark-defaults.conf
  driverMemory            null
  driverCores             null
  driverExtraClassPath    $SPARK_HOME/jars/*.jar
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            2
  files                   null
  pyFiles                 hdfs://
  archives                hdfs://
  mainClass               null
  primaryResource         hdfs://
  name                    pytest
  childArgs               []
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Trying to unpack that gz file

in the Python code I am trying to import pandas

This is what is happening from the pod logs:

Unpacking an archive hdfs://

from /tmp/spark-57c6ace6-c01f-420c-ab88-0cdb9015eb92/pyspark_venv.tar.gz

to /opt/spark/work-dir/./pyspark_venv

Exception in thread "main" ExitCodeException exitCode=2: tar:


Cannot open: Cannot allocate memory

However this works fine when I run the code in local mode as opposed to k8s!


