GitHub user dusktreader opened a pull request:
https://github.com/apache/spark/pull/18981
Fixed pandoc dependency issue in python/setup.py
## Problem Description
When pyspark is listed as a dependency of another package, installing
the other package will cause an install failure in pyspark. When the
other package is being installed, pyspark's setup_requires requirements
are installed including pypandoc. Thus, the exception handling on
setup.py:152 does not work because the pypandoc module is indeed
available. However, the pypandoc.convert() function fails if pandoc
itself is not installed (in our use cases it is not). This raises an
OSError that is not handled, and setup fails.
The following is a sample failure:
```
$ which pandoc
$ pip freeze | grep pypandoc
pypandoc==1.4
$ pip install pyspark
Collecting pyspark
Downloading pyspark-2.2.0.post0.tar.gz (188.3MB)
100%
|ââââââââââââââââââââââââââââââââ|
188.3MB 16.8MB/s
Complete output from command python setup.py egg_info:
Maybe try:
sudo apt-get install pandoc
See http://johnmacfarlane.net/pandoc/installing.html
for installation options
---------------------------------------------------------------
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-mfnizcwa/pyspark/setup.py", line 151, in <module>
long_description = pypandoc.convert('README.md', 'rst')
File
"/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py",
line 69, in convert
outputfile=outputfile, filters=filters)
File
"/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py",
line 260, in _convert_input
_ensure_pandoc_path()
File
"/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py",
line 544, in _ensure_pandoc_path
raise OSError("No pandoc was found: either install pandoc and add
it\n"
OSError: No pandoc was found: either install pandoc and add it
to your PATH or or call pypandoc.download_pandoc(...) or
install pypandoc wheels with included pandoc.
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in
/tmp/pip-build-mfnizcwa/pyspark/
```
## What changes were proposed in this pull request?
This change simply adds an additional exception handler for the OSError
that is raised. This allows pyspark to be installed client-side without
requiring pandoc to be installed.
## How was this patch tested?
I tested this by building a wheel package of pyspark with the change
applied. Then, in a clean virtual environment with pypandoc installed but
pandoc not available on the system, I installed pyspark from the wheel.
Here is the output
```
$ pip freeze | grep pypandoc
pypandoc==1.4
$ which pandoc
$ pip install --no-cache-dir
../spark/python/dist/pyspark-2.3.0.dev0-py2.py3-none-any.whl
Processing
/home/tbeck/work/spark/python/dist/pyspark-2.3.0.dev0-py2.py3-none-any.whl
Requirement already satisfied: py4j==0.10.6 in
/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages (from
pyspark==2.3.0.dev0)
Installing collected packages: pyspark
Successfully installed pyspark-2.3.0.dev0
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dusktreader/spark
dusktreader/fix-pandoc-dependency-issue-in-setup_py
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18981.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18981
----
commit edd53828a23561144b535582430c0805fe54b355
Author: Tucker Beck <[email protected]>
Date: 2017-08-17T22:20:47Z
Fixed pandoc dependency issue in python/setup.py
When pyspark is listed as a dependency of another package, installing
the other package will cause an install failure in pyspark. When the
other package is being installed, pyspark's setup_requires requirements
are installed including pypandoc. Thus, the exception handling on
setup.py:152 does not work because the pypandoc module is indeed
available. However, the pypandoc.convert() function fails if pandoc
itself is not installed (in our use cases it is not). This raises an
OSError that is not handled, and setup fails.
This change simply adds an additional exception handler for the OSError
that is raised.
The following is a sample failure:
$ which pandoc
$ pip freeze | grep pypandoc
pypandoc==1.4
$ pip install pyspark
Collecting pyspark
Downloading pyspark-2.2.0.post0.tar.gz (188.3MB)
100%
|ââââââââââââââââââââââââââââââââ|
188.3MB 16.8MB/s
Complete output from command python setup.py egg_info:
Maybe try:
sudo apt-get install pandoc
See http://johnmacfarlane.net/pandoc/installing.html
for installation options
---------------------------------------------------------------
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-mfnizcwa/pyspark/setup.py", line 151, in <module>
long_description = pypandoc.convert('README.md', 'rst')
File
"/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py",
line 69, in convert
outputfile=outputfile, filters=filters)
File
"/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py",
line 260, in _convert_input
_ensure_pandoc_path()
File
"/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py",
line 544, in _ensure_pandoc_path
raise OSError("No pandoc was found: either install pandoc and add
it\n"
OSError: No pandoc was found: either install pandoc and add it
to your PATH or or call pypandoc.download_pandoc(...) or
install pypandoc wheels with included pandoc.
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in
/tmp/pip-build-mfnizcwa/pyspark/
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]