I have a simple python pipeline that uses a publicly available (PyPI) library.
I can run my pipeline fine using my local runner. I can also run it fine when using DataFlow runner if I provide a setup_file to the pipeline. However, when I try to do this by using a requirements_file instead of a setup_file (recommended and cleaner way when pipeline has only PyPI dependencies) I get an error in my local machine and the job is never submitted to DataFlow. I did some digging and the problem seems to be that when you use a requirements_file the python SDK tries running the following command in the local machine before submitting the external job: python -m pip download --dest /tmp/dataflow-requirements-cache -r /tmp/requirements.txt --exists-action i --no-binary :all: This command seems to be trying to install all these other libraries (apart from the one in my requirements_file): azure-common-1.1.24.zip azure-storage-blob-2.1.0.tar.gz boto3-1.11.9.tar.gz botocore-1.14.9.tar.gz certifi-2019.11.28.tar.gz cffi-1.13.2.tar.gz pycryptodomex-3.9.6.tar.gz pyOpenSSL-19.1.0.tar.gz pytz-2019.3.tar.gz requests-2.22.0.tar.gz urllib3-1.25.8.tar.gz It installs some of them fine but the error seems to come when it tries to install "cryptography": Collecting azure-common<2.0.0 Using cached azure-common-1.1.24.zip (18 kB) Saved /tmp/dataflow-requirements-cache/azure-common-1.1.24.zip Collecting azure-storage-blob<12.0.0 Using cached azure-storage-blob-2.1.0.tar.gz (83 kB) Saved /tmp/dataflow-requirements-cache/azure-storage-blob-2.1.0.tar.gz Collecting boto3<1.12,>=1.4.4 Using cached boto3-1.11.9.tar.gz (98 kB) Saved /tmp/dataflow-requirements-cache/boto3-1.11.9.tar.gz Collecting botocore<1.15,>=1.5.0 Using cached botocore-1.14.9.tar.gz (6.1 MB) Saved /tmp/dataflow-requirements-cache/botocore-1.14.9.tar.gz Collecting requests<2.23.0 Using cached requests-2.22.0.tar.gz (113 kB) Saved /tmp/dataflow-requirements-cache/requests-2.22.0.tar.gz Collecting urllib3<1.26.0,>=1.20 Using cached urllib3-1.25.8.tar.gz (261 kB) Saved /tmp/dataflow-requirements-cache/urllib3-1.25.8.tar.gz Collecting certifi<2021.0.0 Using cached certifi-2019.11.28.tar.gz (156 kB) Saved /tmp/dataflow-requirements-cache/certifi-2019.11.28.tar.gz Collecting pytz<2021.0 Using cached pytz-2019.3.tar.gz (312 kB) Saved /tmp/dataflow-requirements-cache/pytz-2019.3.tar.gz Collecting pycryptodomex!=3.5.0,<4.0.0,>=3.2 Using cached pycryptodomex-3.9.6.tar.gz (15.5 MB) Saved /tmp/dataflow-requirements-cache/pycryptodomex-3.9.6.tar.gz Collecting pyOpenSSL<21.0.0,>=16.2.0 Using cached pyOpenSSL-19.1.0.tar.gz (160 kB) Saved /tmp/dataflow-requirements-cache/pyOpenSSL-19.1.0.tar.gz Collecting cffi<1.14,>=1.9 Using cached cffi-1.13.2.tar.gz (460 kB) Saved /tmp/dataflow-requirements-cache/cffi-1.13.2.tar.gz Collecting cryptography<3.0.0,>=1.8.2 Using cached cryptography-2.8.tar.gz (504 kB) Installing build dependencies ... error ERROR: Command errored out with exit status 1: command: /opt/conda/bin/python /opt/conda/lib/python3.6/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-wq57ycwk/overlay --no-warn-script-location --no-binary :all: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.6.0' wheel 'cffi>=1.8,!=1.11.3; platform_python_implementation != '"'"'PyPy'"'"'' Has anyone else seen this problem? and is there an easy way to fix it? Thank you!
