nchammas commented on issue #27928: [SPARK-31167][BUILD] Refactor how we track Python test/build dependencies URL: https://github.com/apache/spark/pull/27928#issuecomment-602375329 > With this change, we will have to maintain and keep `dev/requirements.txt` up to date. Maybe this is the disconnect between our points of view, because so far I haven't really been following your objections to pinning. Assuming we pin every library, why do we have to keep `dev/requirements.txt` up-to-date? As long as we can build the docs, run tests, and do whatever else we need to do as part of regular development, that file can remain frozen as-is for years. It's only when we specifically want to use some new feature of, say, Sphinx, that we need to bump versions. But that will happen very rarely, I imagine not more than once every couple of years. Does that address your concern? Why do you think we'd need to touch that file more often than once in a long while? > We shouldn't pin `numpy` to encourage people to test the highest versions. It should ideally be `numpy>=1.7` according to `setup.py`. > > * `numpy` is an explicit dependency for ML/MLlib in PySpark. But the specification of numpy in `dev/requirements.txt` is so that we can build our docs. (It seems strange, but yes, numpy is a requirement to build our Python API docs.) Maybe we can improve this by replacing numpy in `dev/requirements.txt` with a reference to `setup.py`. That way we can track PySpark dependencies (whether for building the docs or for general execution) in one place. This will also pick up the Pandas requirement. How does that sound? A separate issue I raised earlier is that, if we want to not pin our build/test dependencies, we need to figure out what to do about the Spark Docker image and CI. Either those will also source the unpinned requirements from the same file, or we go back to having the requirements specified in duplicate--with pinned versions for Docker and CI, and without pinned versions for developers. Obviously, I'd prefer to pin everything and keep it in one place, but if you want to go one of these routes I guess I'll do that. I just want to understand and try to address your objections before going there.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
