Hi,
Maybe someone can shed some light on this.
Running Pyspark job in minikube.
Because it is PySpark the following two conf parameters are used:
spark-submit --verbose \
--master k8s://$K8S_SERVER \
--deploy-mode cluster \
--name pytest \
--py-files hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/DSBQ.zip \
--archives
hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/${pyspark_venv}.tar.gz#${pyspark_venv}
\
The first file --py-files send the zipped PySpark project
The second one --archives is used to send the package dependencies created
with conda
These are the output from spark
Parsed arguments:
master k8s://192.168.49.2:8443
deployMode cluster
executorMemory 5000m
executorCores 1
totalExecutorCores null
propertiesFile /opt/spark/conf/spark-defaults.conf
driverMemory null
driverCores null
driverExtraClassPath $SPARK_HOME/jars/*.jar
driverExtraLibraryPath null
driverExtraJavaOptions null
supervise false
queue null
numExecutors 2
files null
pyFiles hdfs://50.140.197.220:9000/minikube/codes/DSBQ.zip
archives hdfs://
50.140.197.220:9000/minikube/codes/pyspark_venv.tar.gz#pyspark_venv
mainClass null
primaryResource hdfs://
50.140.197.220:9000/minikube/codes/testpackages.py
name pytest
childArgs []
jars null
packages null
packagesExclusions null
repositories null
verbose true
Trying to unpack that gz file
in the Python code I am trying to import pandas
This is what is happening from the pod logs:
Unpacking an archive hdfs://
50.140.197.220:9000/minikube/codes/pyspark_venv.tar.gz#pyspark_venv
from /tmp/spark-57c6ace6-c01f-420c-ab88-0cdb9015eb92/pyspark_venv.tar.gz
to /opt/spark/work-dir/./pyspark_venv
Exception in thread "main" ExitCodeException exitCode=2: tar:
lib/python3.7/site-packages/pandas/tests/util/__pycache__/
test_assert_categorical_equal.cpython-37.pyc:
Cannot open: Cannot allocate memory
However this works fine when I run the code in local mode as opposed to k8s!
thanks
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.