[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt
[ https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614227#comment-16614227 ] Scott Jungwirth commented on BEAM-3106: --- It looks like this particular issue (bigquery) will be fixed in v2.7.0 https://github.com/apache/beam/commit/fba5e89820b9cab3fa63502030fd465aecf60556#diff-e9d0ab71f74dc10309a29b697ee99330 > Consider not pinning all python dependencies, or moving them to > requirements.txt > > > Key: BEAM-3106 > URL: https://issues.apache.org/jira/browse/BEAM-3106 > Project: Beam > Issue Type: Wish > Components: build-system >Affects Versions: 2.1.0 > Environment: python >Reporter: Maximilian Roos >Priority: Major > > Currently all python dependencies are [pinned or > capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97] > While there's a good argument for supplying a `requirements.txt` with well > tested dependencies, having them specified in `setup.py` forces them to an > exact state on each install of Beam. This makes using Beam in any environment > with other libraries nigh on impossible. > This is particularly severe for the `gcp` dependencies, where we have > libraries that won't work with an older version (but Beam _does_ work with an > newer version). We have to do a bunch of gymnastics to get the correct > versions installed because of this. Unfortunately, airflow repeats this > practice and conflicts on a number of dependencies, adding further > complication (but, again there is no real conflict). > I haven't seen this practice outside of the Apache & Google ecosystem - for > example no libraries in numerical python do this. Here's a [discussion on > SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt
[ https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614093#comment-16614093 ] Robert Bradshaw commented on BEAM-3106: --- Our main requirements now specify version ranges (generally guided by semantic versioning); we should unpin our gcp requirements when possible as well wherever possible. > Consider not pinning all python dependencies, or moving them to > requirements.txt > > > Key: BEAM-3106 > URL: https://issues.apache.org/jira/browse/BEAM-3106 > Project: Beam > Issue Type: Wish > Components: build-system >Affects Versions: 2.1.0 > Environment: python >Reporter: Maximilian Roos >Priority: Major > > Currently all python dependencies are [pinned or > capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97] > While there's a good argument for supplying a `requirements.txt` with well > tested dependencies, having them specified in `setup.py` forces them to an > exact state on each install of Beam. This makes using Beam in any environment > with other libraries nigh on impossible. > This is particularly severe for the `gcp` dependencies, where we have > libraries that won't work with an older version (but Beam _does_ work with an > newer version). We have to do a bunch of gymnastics to get the correct > versions installed because of this. Unfortunately, airflow repeats this > practice and conflicts on a number of dependencies, adding further > complication (but, again there is no real conflict). > I haven't seen this practice outside of the Apache & Google ecosystem - for > example no libraries in numerical python do this. Here's a [discussion on > SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt
[ https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614047#comment-16614047 ] Scott Jungwirth commented on BEAM-3106: --- I just ran into this issue using Google's Cloud Composer (managed airflow) after adding the 2.6.0 (current latest) beam sdk pypy package (apache-beam[gcp]>=2.6.0). Looking at the build log, it looks like apache-beam[gcp] caused a downgrade of some other google-cloud packages: ... Installing collected packages: pydot, fastavro, pytz, google-cloud-core, google-cloud-bigquery, apache-beam, pysftp, google-cloud-firestore, msgpack, cachecontrol, firebase-admin, webob, bugsnag Found existing installation: pytz 2018.5 Uninstalling pytz-2018.5: Successfully uninstalled pytz-2018.5 Found existing installation: google-cloud-core 0.28.1 Uninstalling google-cloud-core-0.28.1: Successfully uninstalled google-cloud-core-0.28.1 Found existing installation: google-cloud-bigquery 1.5.0 Uninstalling google-cloud-bigquery-1.5.0: Successfully uninstalled google-cloud-bigquery-1.5.0 Found existing installation: apache-beam 2.5.0 Uninstalling apache-beam-2.5.0: Successfully uninstalled apache-beam-2.5.0 Successfully installed apache-beam-2.6.0 bugsnag-3.4.3 cachecontrol-0.12.5 fastavro-0.19.7 firebase-admin-2.13.0 google-cloud-bigquery-0.25.0 google-cloud-core-0.25.0 google-cloud-firestore-0.29.0 msgpack-0.5.6 pydot-1.2.4 pysftp-0.2.9 pytz-2018.4 webob-1.8.2 I tracked this down to the pinned requirement for bigquery: {{google-cloud-bigquery==0.25.0}} [https://github.com/apache/beam/blob/v2.6.0/sdks/python/setup.py#L140] Which led to these pip warnings $ pipdeptree --warn Warning!!! Possibly conflicting dependencies found: * google-cloud-storage==1.10.0 - google-cloud-core [required: <0.29dev,>=0.28.0, installed: 0.25.0] * google-cloud-firestore==0.29.0 - google-cloud-core [required: <0.29dev,>=0.28.0, installed: 0.25.0] * pandas-gbq==0.6.0 - google-cloud-bigquery [required: >=0.32.0, installed: 0.25.0] * google-cloud-dataflow==2.5.0 - apache-beam [required: ==2.5.0, installed: 2.6.0] * google-cloud-logging==1.6.0 - google-cloud-core [required: <0.29dev,>=0.28.0, installed: 0.25.0] And the exception I was getting was from another google cloud storage module File "/usr/local/lib/python2.7/site-packages/google/cloud/storage/blob.py", line 535, in download_to_file ... File "/usr/local/lib/python2.7/site-packages/google/resumable_media/_helpers.py", line 146, in wait_and_retry response = func() File "/usr/local/lib/python2.7/site-packages/google_auth_httplib2.py", line 198, in request uri, method, body=body, headers=request_headers, **kwargs) TypeError: request() got an unexpected keyword argument 'data' I was able to work-around this issue by explicitly installing the desired versions of the google-cloud-core>=0.28.0 and google-cloud-bigquery>=1.5.0 modules after the apache-beam[gcp]>=2.6.0 module. > Consider not pinning all python dependencies, or moving them to > requirements.txt > > > Key: BEAM-3106 > URL: https://issues.apache.org/jira/browse/BEAM-3106 > Project: Beam > Issue Type: Wish > Components: build-system >Affects Versions: 2.1.0 > Environment: python >Reporter: Maximilian Roos >Priority: Major > > Currently all python dependencies are [pinned or > capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97] > While there's a good argument for supplying a `requirements.txt` with well > tested dependencies, having them specified in `setup.py` forces them to an > exact state on each install of Beam. This makes using Beam in any environment > with other libraries nigh on impossible. > This is particularly severe for the `gcp` dependencies, where we have > libraries that won't work with an older version (but Beam _does_ work with an > newer version). We have to do a bunch of gymnastics to get the correct > versions installed because of this. Unfortunately, airflow repeats this > practice and conflicts on a number of dependencies, adding further > complication (but, again there is no real conflict). > I haven't seen this practice outside of the Apache & Google ecosystem - for > example no libraries in numerical python do this. Here's a [discussion on > SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt
[ https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574053#comment-16574053 ] Ahmet Altay commented on BEAM-3106: --- [~cclauss] I am not familiar with pipenv. Could you explain how it addresses this problem? What other considerations would there be for us to think about? > Consider not pinning all python dependencies, or moving them to > requirements.txt > > > Key: BEAM-3106 > URL: https://issues.apache.org/jira/browse/BEAM-3106 > Project: Beam > Issue Type: Wish > Components: build-system >Affects Versions: 2.1.0 > Environment: python >Reporter: Maximilian Roos >Priority: Major > > Currently all python dependencies are [pinned or > capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97] > While there's a good argument for supplying a `requirements.txt` with well > tested dependencies, having them specified in `setup.py` forces them to an > exact state on each install of Beam. This makes using Beam in any environment > with other libraries nigh on impossible. > This is particularly severe for the `gcp` dependencies, where we have > libraries that won't work with an older version (but Beam _does_ work with an > newer version). We have to do a bunch of gymnastics to get the correct > versions installed because of this. Unfortunately, airflow repeats this > practice and conflicts on a number of dependencies, adding further > complication (but, again there is no real conflict). > I haven't seen this practice outside of the Apache & Google ecosystem - for > example no libraries in numerical python do this. Here's a [discussion on > SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt
[ https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572766#comment-16572766 ] cclauss commented on BEAM-3106: --- What about moving to https://docs.pipenv.org ? > Consider not pinning all python dependencies, or moving them to > requirements.txt > > > Key: BEAM-3106 > URL: https://issues.apache.org/jira/browse/BEAM-3106 > Project: Beam > Issue Type: Wish > Components: build-system >Affects Versions: 2.1.0 > Environment: python >Reporter: Maximilian Roos >Priority: Major > > Currently all python dependencies are [pinned or > capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97] > While there's a good argument for supplying a `requirements.txt` with well > tested dependencies, having them specified in `setup.py` forces them to an > exact state on each install of Beam. This makes using Beam in any environment > with other libraries nigh on impossible. > This is particularly severe for the `gcp` dependencies, where we have > libraries that won't work with an older version (but Beam _does_ work with an > newer version). We have to do a bunch of gymnastics to get the correct > versions installed because of this. Unfortunately, airflow repeats this > practice and conflicts on a number of dependencies, adding further > complication (but, again there is no real conflict). > I haven't seen this practice outside of the Apache & Google ecosystem - for > example no libraries in numerical python do this. Here's a [discussion on > SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt
[ https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341914#comment-16341914 ] Ahmet Altay commented on BEAM-3106: --- I mentioned upgrading the libraries that beam depends on the their newer versions. I do not think there is a good solution today for mixing and matching dependencies in the same virtualenv. > Consider not pinning all python dependencies, or moving them to > requirements.txt > > > Key: BEAM-3106 > URL: https://issues.apache.org/jira/browse/BEAM-3106 > Project: Beam > Issue Type: Wish > Components: build-system >Affects Versions: 2.1.0 > Environment: python >Reporter: Maximilian Roos >Priority: Major > > Currently all python dependencies are [pinned or > capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97] > While there's a good argument for supplying a `requirements.txt` with well > tested dependencies, having them specified in `setup.py` forces them to an > exact state on each install of Beam. This makes using Beam in any environment > with other libraries nigh on impossible. > This is particularly severe for the `gcp` dependencies, where we have > libraries that won't work with an older version (but Beam _does_ work with an > newer version). We have to do a bunch of gymnastics to get the correct > versions installed because of this. Unfortunately, airflow repeats this > practice and conflicts on a number of dependencies, adding further > complication (but, again there is no real conflict). > I haven't seen this practice outside of the Apache & Google ecosystem - for > example no libraries in numerical python do this. Here's a [discussion on > SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt
[ https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341912#comment-16341912 ] RK commented on BEAM-3106: -- This can result in some difficult-to-pin-down errors in the google-cloud-platform python libraries. For example, on a clean virtualenv: {code:java} pip install google-cloud-storage {code} Now: {code:java} from google.cloud.storage import Client print(Client().bucket("gcp-public-data-landsat")\ .blob("LE00/PRE/001/049/LE07_L1TP_001049_20160215_20161015_01_T1/LE07_L1TP_001049_20160215_20161015_01_T1_ANG.txt")\ .download_as_string()) {code} Works as expected, but after installing apache_beam[gcp] {code:java} pip install apache_beam[gcp] {code} {code:java} # same code as above from google.cloud.storage import Client print(Client().bucket("gcp-public-data-landsat")\ .blob("LE00/PRE/001/049/LE07_L1TP_001049_20160215_20161015_01_T1/LE07_L1TP_001049_20160215_20161015_01_T1_ANG.txt")\ .download_as_string()) # File "/Users/karbr001/Documents/tmp_gcs_test/env/lib/python2.7/site-packages/google_auth_httplib2.py", line 198, in request # uri, method, body=body, headers=request_headers, **kwargs) # TypeError: request() got an unexpected keyword argument 'data' {code} [~altay] what's the temporary relief you mentioned? Is it just installing beam with a custom setup.py file that points bigquery 0.28? > Consider not pinning all python dependencies, or moving them to > requirements.txt > > > Key: BEAM-3106 > URL: https://issues.apache.org/jira/browse/BEAM-3106 > Project: Beam > Issue Type: Wish > Components: build-system >Affects Versions: 2.1.0 > Environment: python >Reporter: Maximilian Roos >Priority: Major > > Currently all python dependencies are [pinned or > capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97] > While there's a good argument for supplying a `requirements.txt` with well > tested dependencies, having them specified in `setup.py` forces them to an > exact state on each install of Beam. This makes using Beam in any environment > with other libraries nigh on impossible. > This is particularly severe for the `gcp` dependencies, where we have > libraries that won't work with an older version (but Beam _does_ work with an > newer version). We have to do a bunch of gymnastics to get the correct > versions installed because of this. Unfortunately, airflow repeats this > practice and conflicts on a number of dependencies, adding further > complication (but, again there is no real conflict). > I haven't seen this practice outside of the Apache & Google ecosystem - for > example no libraries in numerical python do this. Here's a [discussion on > SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt
[ https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313731#comment-16313731 ] Ahmet Altay commented on BEAM-3106: --- [~m...@maxroos.com] if it helps you can upgrade Beam's bigquery version as a temporary relief until we have a permanent fix to this issue. > Consider not pinning all python dependencies, or moving them to > requirements.txt > > > Key: BEAM-3106 > URL: https://issues.apache.org/jira/browse/BEAM-3106 > Project: Beam > Issue Type: Wish > Components: build-system >Affects Versions: 2.1.0 > Environment: python >Reporter: Maximilian Roos > > Currently all python dependencies are [pinned or > capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97] > While there's a good argument for supplying a `requirements.txt` with well > tested dependencies, having them specified in `setup.py` forces them to an > exact state on each install of Beam. This makes using Beam in any environment > with other libraries nigh on impossible. > This is particularly severe for the `gcp` dependencies, where we have > libraries that won't work with an older version (but Beam _does_ work with an > newer version). We have to do a bunch of gymnastics to get the correct > versions installed because of this. Unfortunately, airflow repeats this > practice and conflicts on a number of dependencies, adding further > complication (but, again there is no real conflict). > I haven't seen this practice outside of the Apache & Google ecosystem - for > example no libraries in numerical python do this. Here's a [discussion on > SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt
[ https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313574#comment-16313574 ] Maximilian Roos commented on BEAM-3106: --- Thanks for your earlier responses Ahmet. To give a concrete case, as I find those can be helpful beyond the abstract: Currently Beam tags google-cloud-bigquery to 0.25.0, from [June 26|https://github.com/GoogleCloudPlatform/google-cloud-python/releases/tag/bigquery-0.25.0]. The most up-to-date is 0.29.0. We have a library that depends on >=0.28.0, which we can't use at the same time as Beam. And we have to set up two separate build paths - one to test with Beam and another to test with the existing library. Cheers, Max > Consider not pinning all python dependencies, or moving them to > requirements.txt > > > Key: BEAM-3106 > URL: https://issues.apache.org/jira/browse/BEAM-3106 > Project: Beam > Issue Type: Wish > Components: build-system >Affects Versions: 2.1.0 > Environment: python >Reporter: Maximilian Roos > > Currently all python dependencies are [pinned or > capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97] > While there's a good argument for supplying a `requirements.txt` with well > tested dependencies, having them specified in `setup.py` forces them to an > exact state on each install of Beam. This makes using Beam in any environment > with other libraries nigh on impossible. > This is particularly severe for the `gcp` dependencies, where we have > libraries that won't work with an older version (but Beam _does_ work with an > newer version). We have to do a bunch of gymnastics to get the correct > versions installed because of this. Unfortunately, airflow repeats this > practice and conflicts on a number of dependencies, adding further > complication (but, again there is no real conflict). > I haven't seen this practice outside of the Apache & Google ecosystem - for > example no libraries in numerical python do this. Here's a [discussion on > SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt
[ https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223793#comment-16223793 ] Ahmet Altay commented on BEAM-3106: --- This plan sounds good to me. Thank you. I will unassign so that anyone in the community can pick up and implement this. I would be happy to discuss implementation details. cc: [~markflyhigh] for the mechanics. Mark earlier mentioned interested in building test framework elements to follow changes in dependencies. > Consider not pinning all python dependencies, or moving them to > requirements.txt > > > Key: BEAM-3106 > URL: https://issues.apache.org/jira/browse/BEAM-3106 > Project: Beam > Issue Type: Wish > Components: build-system >Affects Versions: 2.1.0 > Environment: python >Reporter: Maximilian Roos >Assignee: Ahmet Altay > > Currently all python dependencies are [pinned or > capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97] > While there's a good argument for supplying a `requirements.txt` with well > tested dependencies, having them specified in `setup.py` forces them to an > exact state on each install of Beam. This makes using Beam in any environment > with other libraries nigh on impossible. > This is particularly severe for the `gcp` dependencies, where we have > libraries that won't work with an older version (but Beam _does_ work with an > newer version). We have to do a bunch of gymnastics to get the correct > versions installed because of this. Unfortunately, airflow repeats this > practice and conflicts on a number of dependencies, adding further > complication (but, again there is no real conflict). > I haven't seen this practice outside of the Apache & Google ecosystem - for > example no libraries in numerical python do this. Here's a [discussion on > SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt
[ https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223716#comment-16223716 ] Maximilian Roos commented on BEAM-3106: --- > What do you think about a policy like, reviewing capped dependencies at every > release and ensuring that a) we are including latest versions of these known > dependencies, b) we are testing with those dependencies before a release. I think that's a reasonable compromise, thanks Ahmet. To close this off, here's the system that works really well in the numerical python ecosystem (e.g. pandas / numpy / xarray): - Run CI tests on a number of released versions of each dependency - Any backward incompatible changes in dependencies are deprecated in advance, and tests catch those deprecation warnings - giving plenty of time for changes (this relies on all dependencies raising warnings for backward-incompatible changes) - In the extensively maintained libraries, tests are also run on master branch of dependencies, to quickly flag any potential breakages > Consider not pinning all python dependencies, or moving them to > requirements.txt > > > Key: BEAM-3106 > URL: https://issues.apache.org/jira/browse/BEAM-3106 > Project: Beam > Issue Type: Wish > Components: build-system >Affects Versions: 2.1.0 > Environment: python >Reporter: Maximilian Roos >Assignee: Ahmet Altay > > Currently all python dependencies are [pinned or > capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97] > While there's a good argument for supplying a `requirements.txt` with well > tested dependencies, having them specified in `setup.py` forces them to an > exact state on each install of Beam. This makes using Beam in any environment > with other libraries nigh on impossible. > This is particularly severe for the `gcp` dependencies, where we have > libraries that won't work with an older version (but Beam _does_ work with an > newer version). We have to do a bunch of gymnastics to get the correct > versions installed because of this. Unfortunately, airflow repeats this > practice and conflicts on a number of dependencies, adding further > complication (but, again there is no real conflict). > I haven't seen this practice outside of the Apache & Google ecosystem - for > example no libraries in numerical python do this. Here's a [discussion on > SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt
[ https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222761#comment-16222761 ] Ahmet Altay commented on BEAM-3106: --- The advantage of putting them in {{setup.py}} is that the package contains the full dependency information and pip install just works, as opposed publishing separate versioned requirements file that needs to be installed. I agree this is a tough problem, but I think the current solution is better than having a requirements file. What do you think about a policy like, reviewing capped dependencies at every release and ensuring that a) we are including latest versions of these known dependencies, b) we are testing with those dependencies before a release. > Consider not pinning all python dependencies, or moving them to > requirements.txt > > > Key: BEAM-3106 > URL: https://issues.apache.org/jira/browse/BEAM-3106 > Project: Beam > Issue Type: Wish > Components: build-system >Affects Versions: 2.1.0 > Environment: python >Reporter: Maximilian Roos >Assignee: Ahmet Altay > > Currently all python dependencies are [pinned or > capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97] > While there's a good argument for supplying a `requirements.txt` with well > tested dependencies, having them specified in `setup.py` forces them to an > exact state on each install of Beam. This makes using Beam in any environment > with other libraries nigh on impossible. > This is particularly severe for the `gcp` dependencies, where we have > libraries that won't work with an older version (but Beam _does_ work with an > newer version). We have to do a bunch of gymnastics to get the correct > versions installed because of this. Unfortunately, airflow repeats this > practice and conflicts on a number of dependencies, adding further > complication (but, again there is no real conflict). > I haven't seen this practice outside of the Apache & Google ecosystem - for > example no libraries in numerical python do this. Here's a [discussion on > SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt
[ https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222698#comment-16222698 ] Maximilian Roos commented on BEAM-3106: --- Yes, those are good points, and this is a tough problem; particularly for Beam (or airflow) which relies on a lot of relatively fast-moving dependencies. That said, the consensus in the python community seems to be that `requirements.txt` is the place to put those pins, rather than `setup.py`, and that putting them in `setup.py` creates more interference than clarity. Pinning to major versions could be a reasonable compromise. Though as an example, `google-cloud-bigquery` is pinned to a version 3 behind the latest, while it works (or at least I haven't had any issues) with the latest. Thanks Ahmet. > Consider not pinning all python dependencies, or moving them to > requirements.txt > > > Key: BEAM-3106 > URL: https://issues.apache.org/jira/browse/BEAM-3106 > Project: Beam > Issue Type: Wish > Components: build-system >Affects Versions: 2.1.0 > Environment: python >Reporter: Maximilian Roos >Assignee: Ahmet Altay > > Currently all python dependencies are [pinned or > capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97] > While there's a good argument for supplying a `requirements.txt` with well > tested dependencies, having them specified in `setup.py` forces them to an > exact state on each install of Beam. This makes using Beam in any environment > with other libraries nigh on impossible. > This is particularly severe for the `gcp` dependencies, where we have > libraries that won't work with an older version (but Beam _does_ work with an > newer version). We have to do a bunch of gymnastics to get the correct > versions installed because of this. Unfortunately, airflow repeats this > practice and conflicts on a number of dependencies, adding further > complication (but, again there is no real conflict). > I haven't seen this practice outside of the Apache & Google ecosystem - for > example no libraries in numerical python do this. Here's a [discussion on > SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt
[ https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222645#comment-16222645 ] Ahmet Altay commented on BEAM-3106: --- The reason behind pinning or capping dependencies is to prevent broken releases after time passes. In the past there were many occasions of a dependency releasing a backward-incompatible change and breaking and already release Beam version. This is bad for existing Beam users, because at some point in the future their currently working Beam release may stop and force them to do upgrades. We follow the semantic versioning rules (capping) in general with dependencies. (For example: 'avro>=1.8.1,<2.0.0'). However, in the past, some dependencies, also released breaking changes without incrementing the major version. For those dependencies only, we pin their version to prevent any future breakages. We can consider an alternative policy to what we are doing today, but it is important for us to ensure (as much as possible) that already released Beam versions will continue to work even after breaking change in a dependency. > Consider not pinning all python dependencies, or moving them to > requirements.txt > > > Key: BEAM-3106 > URL: https://issues.apache.org/jira/browse/BEAM-3106 > Project: Beam > Issue Type: Wish > Components: build-system >Affects Versions: 2.1.0 > Environment: python >Reporter: Maximilian Roos >Assignee: Ahmet Altay > > Currently all python dependencies are [pinned or > capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97] > While there's a good argument for supplying a `requirements.txt` with well > tested dependencies, having them specified in `setup.py` forces them to an > exact state on each install of Beam. This makes using Beam in any environment > with other libraries nigh on impossible. > This is particularly severe for the `gcp` dependencies, where we have > libraries that won't work with an older version (but Beam _does_ work with an > newer version). We have to do a bunch of gymnastics to get the correct > versions installed because of this. Unfortunately, airflow repeats this > practice and conflicts on a number of dependencies, adding further > complication (but, again there is no real conflict). > I haven't seen this practice outside of the Apache & Google ecosystem - for > example no libraries in numerical python do this. Here's a [discussion on > SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions] -- This message was sent by Atlassian JIRA (v6.4.14#64029)