Hi, This is a bit of an old hat but worth getting opinions on it.
Current options that I believe apply are: 1. Installing them individually via pip in the docker build process 2. Installing them together via pip in the build process via requirments.txt 3. Installing them to a volume and adding the volume to the PYTHONPATH >From my experience there is a case of installing them at docker build process: RUN pip install pyyaml --no-cache-dir RUN pip install --no-cache-dir -r requirements.txt or using the following in spark-submit --archives pyspark_venv.tar.gz#environment The problem with archives as I have noticed that unzipping and untarring packages takes a considerable time and sometimes spark-submit hangs! with in-built docker the version of package may get out of date, although this has not been an issue for me. So there are pros and cons either way. However, with the CICD pipeline, we can build docker files with higher frequencies if needed. Docker files have a drawback of the more packages, the more the docker size and of course pulling it all from the container registry (ecr, gcr etc), will consume more time and will impact the deployment time. I still favour 1 or 2 above. Thanks Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.