To whom it may concern, I am trying to install scikit-learn in a PySpark job using the install_pypi_package PySpark API but the install fails with :
sc.install_pypi_package("scikit-learn") Collecting scikit-learn Using cached https://files.pythonhosted.org/packages/db/e2/9c0bde5f81394b627f623557690536b12017b84988a4a1f98ec826edab9e/scikit-learn-0.24.0.tar.gz Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) Collecting scipy>=0.19.1 (from scikit-learn) Using cached https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) Collecting threadpoolctl>=2.0.0 (from scikit-learn) Using cached https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl Building wheels for collected packages: scikit-learn Running setup.py bdist_wheel for scikit-learn: started Running setup.py bdist_wheel for scikit-learn: finished with status 'error' Complete output from command /tmp/1611000009300-0/bin/python -u -c "import setuptools, tokenize;__file__='/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpry3gf9r0pip-wheel- --python-tag cp37: Partial import of sklearn during the build process. Traceback (most recent call last): File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py", line 201, in check_package_status module = importlib.import_module(package) File "/tmp/1611000009300-0/lib64/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1006, in _gcd_import File "<frozen importlib._bootstrap>", line 983, in _find_and_load File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked ModuleNotFoundError: No module named 'scipy' Traceback (most recent call last): File "<string>", line 1, in <module> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py", line 306, in <module> setup_package() File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py", line 294, in setup_package check_package_status('scipy', min_deps.SCIPY_MIN_VERSION) File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py", line 227, in check_package_status .format(package, req_str, instructions)) ImportError: scipy is not installed. scikit-learn requires scipy >= 0.19.1. I do not encounter this error with scikit-learn 0.23.2 : sc.install_pypi_package("scikit-learn==0.23.2") Collecting scikit-learn==0.23.2 Using cached https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl Requirement already satisfied: scipy>=0.19.1 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2) Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2) Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2) Installing collected packages: scikit-learn Successfully installed scikit-learn-0.23.2 Could you please help me understand why the scikit-learn 0.24 installation fails ? Thank you for your help, Bertrand
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn