I have a simple python pipeline that uses a publicly available (PyPI)
library.

I can run my pipeline fine using my local runner.

I can also run it fine when using DataFlow runner if I provide a setup_file
to the pipeline.

However, when I try to do this by using a requirements_file instead of a
setup_file (recommended and cleaner way when pipeline has only PyPI
dependencies) I get an error in my local machine and the job is never
submitted to DataFlow.

I did some digging and the problem seems to be that when you use a
requirements_file the python SDK tries running the following command in the
local machine before submitting the external job:

python -m pip download --dest /tmp/dataflow-requirements-cache -r
/tmp/requirements.txt --exists-action i --no-binary :all:


This command seems to be trying to install all these other libraries (apart
from the one in my requirements_file):

azure-common-1.1.24.zip
azure-storage-blob-2.1.0.tar.gz
boto3-1.11.9.tar.gz
botocore-1.14.9.tar.gz
certifi-2019.11.28.tar.gz
cffi-1.13.2.tar.gz
pycryptodomex-3.9.6.tar.gz
pyOpenSSL-19.1.0.tar.gz
pytz-2019.3.tar.gz
requests-2.22.0.tar.gz
urllib3-1.25.8.tar.gz

It installs some of them fine but the error seems to come when it tries to
install "cryptography":

Collecting azure-common<2.0.0
  Using cached azure-common-1.1.24.zip (18 kB)
  Saved /tmp/dataflow-requirements-cache/azure-common-1.1.24.zip
Collecting azure-storage-blob<12.0.0
  Using cached azure-storage-blob-2.1.0.tar.gz (83 kB)
  Saved /tmp/dataflow-requirements-cache/azure-storage-blob-2.1.0.tar.gz
Collecting boto3<1.12,>=1.4.4
  Using cached boto3-1.11.9.tar.gz (98 kB)
  Saved /tmp/dataflow-requirements-cache/boto3-1.11.9.tar.gz
Collecting botocore<1.15,>=1.5.0
  Using cached botocore-1.14.9.tar.gz (6.1 MB)
  Saved /tmp/dataflow-requirements-cache/botocore-1.14.9.tar.gz
Collecting requests<2.23.0
  Using cached requests-2.22.0.tar.gz (113 kB)
  Saved /tmp/dataflow-requirements-cache/requests-2.22.0.tar.gz
Collecting urllib3<1.26.0,>=1.20
  Using cached urllib3-1.25.8.tar.gz (261 kB)
  Saved /tmp/dataflow-requirements-cache/urllib3-1.25.8.tar.gz
Collecting certifi<2021.0.0
  Using cached certifi-2019.11.28.tar.gz (156 kB)
  Saved /tmp/dataflow-requirements-cache/certifi-2019.11.28.tar.gz
Collecting pytz<2021.0
  Using cached pytz-2019.3.tar.gz (312 kB)
  Saved /tmp/dataflow-requirements-cache/pytz-2019.3.tar.gz
Collecting pycryptodomex!=3.5.0,<4.0.0,>=3.2
  Using cached pycryptodomex-3.9.6.tar.gz (15.5 MB)
  Saved /tmp/dataflow-requirements-cache/pycryptodomex-3.9.6.tar.gz
Collecting pyOpenSSL<21.0.0,>=16.2.0
  Using cached pyOpenSSL-19.1.0.tar.gz (160 kB)
  Saved /tmp/dataflow-requirements-cache/pyOpenSSL-19.1.0.tar.gz
Collecting cffi<1.14,>=1.9
  Using cached cffi-1.13.2.tar.gz (460 kB)
  Saved /tmp/dataflow-requirements-cache/cffi-1.13.2.tar.gz
Collecting cryptography<3.0.0,>=1.8.2
  Using cached cryptography-2.8.tar.gz (504 kB)
  Installing build dependencies ... error  ERROR: Command errored out
with exit status 1:
   command: /opt/conda/bin/python
/opt/conda/lib/python3.6/site-packages/pip install --ignore-installed
--no-user --prefix /tmp/pip-build-env-wq57ycwk/overlay
--no-warn-script-location --no-binary :all: --only-binary :none: -i
https://pypi.org/simple -- 'setuptools>=40.6.0' wheel
'cffi>=1.8,!=1.11.3; platform_python_implementation != '"'"'PyPy'"'"''



Has anyone else seen this problem? and is there an easy way to fix it?


Thank you!

Reply via email to