[GitHub] [spark] nchammas commented on issue #27928: [SPARK-31167][BUILD] Refactor how we track Python test/build dependencies

GitBox Sun, 22 Mar 2020 21:14:46 -0700

nchammas commented on issue #27928: [SPARK-31167][BUILD] Refactor how we track 
Python test/build dependencies
URL: https://github.com/apache/spark/pull/27928#issuecomment-602375329
 
 
   > With this change, we will have to maintain and keep `dev/requirements.txt` 
up to date. 
   
   Maybe this is the disconnect between our points of view, because so far I 
haven't really been following your objections to pinning. Assuming we pin every 
library, why do we have to keep `dev/requirements.txt` up-to-date?
   
   As long as we can build the docs, run tests, and do whatever else we need to 
do as part of regular development, that file can remain frozen as-is for years.
   
   It's only when we specifically want to use some new feature of, say, Sphinx, 
that we need to bump versions. But that will happen very rarely, I imagine not 
more than once every couple of years.
   
   Does that address your concern? Why do you think we'd need to touch that 
file more often than once in a long while?
   
   > We shouldn't pin `numpy` to encourage people to test the highest versions. 
It should ideally be `numpy>=1.7` according to `setup.py`.
   > 
   > * `numpy` is an explicit dependency for ML/MLlib in PySpark.
   
   But the specification of numpy in `dev/requirements.txt` is so that we can 
build our docs. (It seems strange, but yes, numpy is a requirement to build our 
Python API docs.)
   
   Maybe we can improve this by replacing numpy in `dev/requirements.txt` with 
a reference to `setup.py`. That way we can track PySpark dependencies (whether 
for building the docs or for general execution) in one place. This will also 
pick up the Pandas requirement. How does that sound?
   
   A separate issue I raised earlier is that, if we want to not pin our 
build/test dependencies, we need to  figure out what to do about the Spark 
Docker image and CI. Either those will also source the unpinned requirements 
from the same file, or we go back to having the requirements specified in 
duplicate--with pinned versions for Docker and CI, and without pinned versions 
for developers.
   
   Obviously, I'd prefer to pin everything and keep it in one place, but if you 
want to go one of these routes I guess I'll do that. I just want to understand 
and try to address your objections before going there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] nchammas commented on issue #27928: [SPARK-31167][BUILD] Refactor how we track Python test/build dependencies

Reply via email to