[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt

2018-09-13 Thread Scott Jungwirth (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614227#comment-16614227
 ] 

Scott Jungwirth commented on BEAM-3106:
---

It looks like this particular issue (bigquery) will be fixed in v2.7.0 
https://github.com/apache/beam/commit/fba5e89820b9cab3fa63502030fd465aecf60556#diff-e9d0ab71f74dc10309a29b697ee99330

> Consider not pinning all python dependencies, or moving them to 
> requirements.txt
> 
>
> Key: BEAM-3106
> URL: https://issues.apache.org/jira/browse/BEAM-3106
> Project: Beam
>  Issue Type: Wish
>  Components: build-system
>Affects Versions: 2.1.0
> Environment: python
>Reporter: Maximilian Roos
>Priority: Major
>
> Currently all python dependencies are [pinned or 
> capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97]
> While there's a good argument for supplying a `requirements.txt` with well 
> tested dependencies, having them specified in `setup.py` forces them to an 
> exact state on each install of Beam. This makes using Beam in any environment 
> with other libraries nigh on impossible. 
> This is particularly severe for the `gcp` dependencies, where we have 
> libraries that won't work with an older version (but Beam _does_ work with an 
> newer version). We have to do a bunch of gymnastics to get the correct 
> versions installed because of this. Unfortunately, airflow repeats this 
> practice and conflicts on a number of dependencies, adding further 
> complication (but, again there is no real conflict).
> I haven't seen this practice outside of the Apache & Google ecosystem - for 
> example no libraries in numerical python do this. Here's a [discussion on 
> SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt

2018-09-13 Thread Robert Bradshaw (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614093#comment-16614093
 ] 

Robert Bradshaw commented on BEAM-3106:
---

Our main requirements now specify version ranges (generally guided by semantic 
versioning); we should unpin our gcp requirements when possible as well 
wherever possible. 

> Consider not pinning all python dependencies, or moving them to 
> requirements.txt
> 
>
> Key: BEAM-3106
> URL: https://issues.apache.org/jira/browse/BEAM-3106
> Project: Beam
>  Issue Type: Wish
>  Components: build-system
>Affects Versions: 2.1.0
> Environment: python
>Reporter: Maximilian Roos
>Priority: Major
>
> Currently all python dependencies are [pinned or 
> capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97]
> While there's a good argument for supplying a `requirements.txt` with well 
> tested dependencies, having them specified in `setup.py` forces them to an 
> exact state on each install of Beam. This makes using Beam in any environment 
> with other libraries nigh on impossible. 
> This is particularly severe for the `gcp` dependencies, where we have 
> libraries that won't work with an older version (but Beam _does_ work with an 
> newer version). We have to do a bunch of gymnastics to get the correct 
> versions installed because of this. Unfortunately, airflow repeats this 
> practice and conflicts on a number of dependencies, adding further 
> complication (but, again there is no real conflict).
> I haven't seen this practice outside of the Apache & Google ecosystem - for 
> example no libraries in numerical python do this. Here's a [discussion on 
> SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt

2018-09-13 Thread Scott Jungwirth (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614047#comment-16614047
 ] 

Scott Jungwirth commented on BEAM-3106:
---

I just ran into this issue using Google's Cloud Composer (managed airflow) 
after adding the 2.6.0 (current latest) beam sdk pypy package 
(apache-beam[gcp]>=2.6.0). Looking at the build log, it looks like 
apache-beam[gcp] caused a downgrade of some other google-cloud packages:
...
Installing collected packages: pydot, fastavro, pytz, google-cloud-core, 
google-cloud-bigquery, apache-beam, pysftp, google-cloud-firestore, msgpack, 
cachecontrol, firebase-admin, webob, bugsnag
Found existing installation: pytz 2018.5
Uninstalling pytz-2018.5:
Successfully uninstalled pytz-2018.5
Found existing installation: google-cloud-core 0.28.1
Uninstalling google-cloud-core-0.28.1:
Successfully uninstalled google-cloud-core-0.28.1
Found existing installation: google-cloud-bigquery 1.5.0
Uninstalling google-cloud-bigquery-1.5.0:
Successfully uninstalled google-cloud-bigquery-1.5.0
Found existing installation: apache-beam 2.5.0
Uninstalling apache-beam-2.5.0:
Successfully uninstalled apache-beam-2.5.0
Successfully installed apache-beam-2.6.0 bugsnag-3.4.3 cachecontrol-0.12.5 
fastavro-0.19.7 firebase-admin-2.13.0 google-cloud-bigquery-0.25.0 
google-cloud-core-0.25.0 google-cloud-firestore-0.29.0 msgpack-0.5.6 
pydot-1.2.4 pysftp-0.2.9 pytz-2018.4 webob-1.8.2
I tracked this down to the pinned requirement for bigquery: 
{{google-cloud-bigquery==0.25.0}}  
[https://github.com/apache/beam/blob/v2.6.0/sdks/python/setup.py#L140]

Which led to these pip warnings
$ pipdeptree --warn
Warning!!! Possibly conflicting dependencies found:
* google-cloud-storage==1.10.0
- google-cloud-core [required: <0.29dev,>=0.28.0, installed: 0.25.0]
* google-cloud-firestore==0.29.0
- google-cloud-core [required: <0.29dev,>=0.28.0, installed: 0.25.0]
* pandas-gbq==0.6.0
- google-cloud-bigquery [required: >=0.32.0, installed: 0.25.0]
* google-cloud-dataflow==2.5.0
- apache-beam [required: ==2.5.0, installed: 2.6.0]
* google-cloud-logging==1.6.0
- google-cloud-core [required: <0.29dev,>=0.28.0, installed: 0.25.0]
 And the exception I was getting was from another google cloud storage module
File "/usr/local/lib/python2.7/site-packages/google/cloud/storage/blob.py", 
line 535, in download_to_file
  ...
File 
"/usr/local/lib/python2.7/site-packages/google/resumable_media/_helpers.py", 
line 146, in wait_and_retry 
  response = func() 
File "/usr/local/lib/python2.7/site-packages/google_auth_httplib2.py", line 
198, in request 
  uri, method, body=body, headers=request_headers, **kwargs) 
TypeError: request() got an unexpected keyword argument 'data'
 

 I was able to work-around this issue by explicitly installing the desired 
versions of the google-cloud-core>=0.28.0 and google-cloud-bigquery>=1.5.0 
modules after the apache-beam[gcp]>=2.6.0 module.

 

 

> Consider not pinning all python dependencies, or moving them to 
> requirements.txt
> 
>
> Key: BEAM-3106
> URL: https://issues.apache.org/jira/browse/BEAM-3106
> Project: Beam
>  Issue Type: Wish
>  Components: build-system
>Affects Versions: 2.1.0
> Environment: python
>Reporter: Maximilian Roos
>Priority: Major
>
> Currently all python dependencies are [pinned or 
> capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97]
> While there's a good argument for supplying a `requirements.txt` with well 
> tested dependencies, having them specified in `setup.py` forces them to an 
> exact state on each install of Beam. This makes using Beam in any environment 
> with other libraries nigh on impossible. 
> This is particularly severe for the `gcp` dependencies, where we have 
> libraries that won't work with an older version (but Beam _does_ work with an 
> newer version). We have to do a bunch of gymnastics to get the correct 
> versions installed because of this. Unfortunately, airflow repeats this 
> practice and conflicts on a number of dependencies, adding further 
> complication (but, again there is no real conflict).
> I haven't seen this practice outside of the Apache & Google ecosystem - for 
> example no libraries in numerical python do this. Here's a [discussion on 
> SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt

2018-08-08 Thread Ahmet Altay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574053#comment-16574053
 ] 

Ahmet Altay commented on BEAM-3106:
---

[~cclauss] I am not familiar with pipenv. Could you explain how it addresses 
this problem? What other considerations would there be for us to think about?

> Consider not pinning all python dependencies, or moving them to 
> requirements.txt
> 
>
> Key: BEAM-3106
> URL: https://issues.apache.org/jira/browse/BEAM-3106
> Project: Beam
>  Issue Type: Wish
>  Components: build-system
>Affects Versions: 2.1.0
> Environment: python
>Reporter: Maximilian Roos
>Priority: Major
>
> Currently all python dependencies are [pinned or 
> capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97]
> While there's a good argument for supplying a `requirements.txt` with well 
> tested dependencies, having them specified in `setup.py` forces them to an 
> exact state on each install of Beam. This makes using Beam in any environment 
> with other libraries nigh on impossible. 
> This is particularly severe for the `gcp` dependencies, where we have 
> libraries that won't work with an older version (but Beam _does_ work with an 
> newer version). We have to do a bunch of gymnastics to get the correct 
> versions installed because of this. Unfortunately, airflow repeats this 
> practice and conflicts on a number of dependencies, adding further 
> complication (but, again there is no real conflict).
> I haven't seen this practice outside of the Apache & Google ecosystem - for 
> example no libraries in numerical python do this. Here's a [discussion on 
> SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt

2018-08-08 Thread cclauss (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572766#comment-16572766
 ] 

cclauss commented on BEAM-3106:
---

What about moving to https://docs.pipenv.org ?

> Consider not pinning all python dependencies, or moving them to 
> requirements.txt
> 
>
> Key: BEAM-3106
> URL: https://issues.apache.org/jira/browse/BEAM-3106
> Project: Beam
>  Issue Type: Wish
>  Components: build-system
>Affects Versions: 2.1.0
> Environment: python
>Reporter: Maximilian Roos
>Priority: Major
>
> Currently all python dependencies are [pinned or 
> capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97]
> While there's a good argument for supplying a `requirements.txt` with well 
> tested dependencies, having them specified in `setup.py` forces them to an 
> exact state on each install of Beam. This makes using Beam in any environment 
> with other libraries nigh on impossible. 
> This is particularly severe for the `gcp` dependencies, where we have 
> libraries that won't work with an older version (but Beam _does_ work with an 
> newer version). We have to do a bunch of gymnastics to get the correct 
> versions installed because of this. Unfortunately, airflow repeats this 
> practice and conflicts on a number of dependencies, adding further 
> complication (but, again there is no real conflict).
> I haven't seen this practice outside of the Apache & Google ecosystem - for 
> example no libraries in numerical python do this. Here's a [discussion on 
> SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt

2018-01-26 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341914#comment-16341914
 ] 

Ahmet Altay commented on BEAM-3106:
---

I mentioned upgrading the libraries that beam depends on the their newer 
versions. I do not think there is a good solution today for mixing and matching 
dependencies in the same virtualenv.

> Consider not pinning all python dependencies, or moving them to 
> requirements.txt
> 
>
> Key: BEAM-3106
> URL: https://issues.apache.org/jira/browse/BEAM-3106
> Project: Beam
>  Issue Type: Wish
>  Components: build-system
>Affects Versions: 2.1.0
> Environment: python
>Reporter: Maximilian Roos
>Priority: Major
>
> Currently all python dependencies are [pinned or 
> capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97]
> While there's a good argument for supplying a `requirements.txt` with well 
> tested dependencies, having them specified in `setup.py` forces them to an 
> exact state on each install of Beam. This makes using Beam in any environment 
> with other libraries nigh on impossible. 
> This is particularly severe for the `gcp` dependencies, where we have 
> libraries that won't work with an older version (but Beam _does_ work with an 
> newer version). We have to do a bunch of gymnastics to get the correct 
> versions installed because of this. Unfortunately, airflow repeats this 
> practice and conflicts on a number of dependencies, adding further 
> complication (but, again there is no real conflict).
> I haven't seen this practice outside of the Apache & Google ecosystem - for 
> example no libraries in numerical python do this. Here's a [discussion on 
> SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt

2018-01-26 Thread RK (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341912#comment-16341912
 ] 

RK commented on BEAM-3106:
--

This can result in some difficult-to-pin-down errors in the 
google-cloud-platform python libraries. For example, on a clean virtualenv:

 
{code:java}
pip install google-cloud-storage
{code}
Now:

 

 
{code:java}
from google.cloud.storage import Client
print(Client().bucket("gcp-public-data-landsat")\

.blob("LE00/PRE/001/049/LE07_L1TP_001049_20160215_20161015_01_T1/LE07_L1TP_001049_20160215_20161015_01_T1_ANG.txt")\
.download_as_string())
{code}
Works as expected, but after installing apache_beam[gcp]

 

 
{code:java}
pip install apache_beam[gcp]
{code}
 
{code:java}
# same code as above
from google.cloud.storage import Client
print(Client().bucket("gcp-public-data-landsat")\

.blob("LE00/PRE/001/049/LE07_L1TP_001049_20160215_20161015_01_T1/LE07_L1TP_001049_20160215_20161015_01_T1_ANG.txt")\
.download_as_string())
# File 
"/Users/karbr001/Documents/tmp_gcs_test/env/lib/python2.7/site-packages/google_auth_httplib2.py",
 line 198, in request
#    uri, method, body=body, headers=request_headers, **kwargs)
# TypeError: request() got an unexpected keyword argument 'data'
{code}
[~altay] what's the temporary relief you mentioned? Is it just installing beam 
with a custom setup.py file that points bigquery 0.28?

 

 

 

 

> Consider not pinning all python dependencies, or moving them to 
> requirements.txt
> 
>
> Key: BEAM-3106
> URL: https://issues.apache.org/jira/browse/BEAM-3106
> Project: Beam
>  Issue Type: Wish
>  Components: build-system
>Affects Versions: 2.1.0
> Environment: python
>Reporter: Maximilian Roos
>Priority: Major
>
> Currently all python dependencies are [pinned or 
> capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97]
> While there's a good argument for supplying a `requirements.txt` with well 
> tested dependencies, having them specified in `setup.py` forces them to an 
> exact state on each install of Beam. This makes using Beam in any environment 
> with other libraries nigh on impossible. 
> This is particularly severe for the `gcp` dependencies, where we have 
> libraries that won't work with an older version (but Beam _does_ work with an 
> newer version). We have to do a bunch of gymnastics to get the correct 
> versions installed because of this. Unfortunately, airflow repeats this 
> practice and conflicts on a number of dependencies, adding further 
> complication (but, again there is no real conflict).
> I haven't seen this practice outside of the Apache & Google ecosystem - for 
> example no libraries in numerical python do this. Here's a [discussion on 
> SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt

2018-01-05 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313731#comment-16313731
 ] 

Ahmet Altay commented on BEAM-3106:
---

[~m...@maxroos.com] if it helps you can upgrade Beam's bigquery version as a 
temporary relief until we have a permanent fix to this issue.

> Consider not pinning all python dependencies, or moving them to 
> requirements.txt
> 
>
> Key: BEAM-3106
> URL: https://issues.apache.org/jira/browse/BEAM-3106
> Project: Beam
>  Issue Type: Wish
>  Components: build-system
>Affects Versions: 2.1.0
> Environment: python
>Reporter: Maximilian Roos
>
> Currently all python dependencies are [pinned or 
> capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97]
> While there's a good argument for supplying a `requirements.txt` with well 
> tested dependencies, having them specified in `setup.py` forces them to an 
> exact state on each install of Beam. This makes using Beam in any environment 
> with other libraries nigh on impossible. 
> This is particularly severe for the `gcp` dependencies, where we have 
> libraries that won't work with an older version (but Beam _does_ work with an 
> newer version). We have to do a bunch of gymnastics to get the correct 
> versions installed because of this. Unfortunately, airflow repeats this 
> practice and conflicts on a number of dependencies, adding further 
> complication (but, again there is no real conflict).
> I haven't seen this practice outside of the Apache & Google ecosystem - for 
> example no libraries in numerical python do this. Here's a [discussion on 
> SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt

2018-01-05 Thread Maximilian Roos (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313574#comment-16313574
 ] 

Maximilian Roos commented on BEAM-3106:
---

Thanks for your earlier responses Ahmet. 

To give a concrete case, as I find those can be helpful beyond the abstract: 
Currently Beam tags google-cloud-bigquery to 0.25.0, from [June 
26|https://github.com/GoogleCloudPlatform/google-cloud-python/releases/tag/bigquery-0.25.0].
 The most up-to-date is 0.29.0. 

We have a library that depends on >=0.28.0, which we can't use at the same time 
as Beam. And we have to set up two separate build paths - one to test with Beam 
and another to test with the existing library.

Cheers, Max

> Consider not pinning all python dependencies, or moving them to 
> requirements.txt
> 
>
> Key: BEAM-3106
> URL: https://issues.apache.org/jira/browse/BEAM-3106
> Project: Beam
>  Issue Type: Wish
>  Components: build-system
>Affects Versions: 2.1.0
> Environment: python
>Reporter: Maximilian Roos
>
> Currently all python dependencies are [pinned or 
> capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97]
> While there's a good argument for supplying a `requirements.txt` with well 
> tested dependencies, having them specified in `setup.py` forces them to an 
> exact state on each install of Beam. This makes using Beam in any environment 
> with other libraries nigh on impossible. 
> This is particularly severe for the `gcp` dependencies, where we have 
> libraries that won't work with an older version (but Beam _does_ work with an 
> newer version). We have to do a bunch of gymnastics to get the correct 
> versions installed because of this. Unfortunately, airflow repeats this 
> practice and conflicts on a number of dependencies, adding further 
> complication (but, again there is no real conflict).
> I haven't seen this practice outside of the Apache & Google ecosystem - for 
> example no libraries in numerical python do this. Here's a [discussion on 
> SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt

2017-10-28 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223793#comment-16223793
 ] 

Ahmet Altay commented on BEAM-3106:
---

This plan sounds good to me. Thank you. I will unassign so that anyone in the 
community can pick up and implement this. I would be happy to discuss 
implementation details.

cc: [~markflyhigh] for the mechanics. Mark earlier mentioned interested in 
building test framework elements to follow changes in dependencies.

> Consider not pinning all python dependencies, or moving them to 
> requirements.txt
> 
>
> Key: BEAM-3106
> URL: https://issues.apache.org/jira/browse/BEAM-3106
> Project: Beam
>  Issue Type: Wish
>  Components: build-system
>Affects Versions: 2.1.0
> Environment: python
>Reporter: Maximilian Roos
>Assignee: Ahmet Altay
>
> Currently all python dependencies are [pinned or 
> capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97]
> While there's a good argument for supplying a `requirements.txt` with well 
> tested dependencies, having them specified in `setup.py` forces them to an 
> exact state on each install of Beam. This makes using Beam in any environment 
> with other libraries nigh on impossible. 
> This is particularly severe for the `gcp` dependencies, where we have 
> libraries that won't work with an older version (but Beam _does_ work with an 
> newer version). We have to do a bunch of gymnastics to get the correct 
> versions installed because of this. Unfortunately, airflow repeats this 
> practice and conflicts on a number of dependencies, adding further 
> complication (but, again there is no real conflict).
> I haven't seen this practice outside of the Apache & Google ecosystem - for 
> example no libraries in numerical python do this. Here's a [discussion on 
> SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt

2017-10-28 Thread Maximilian Roos (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223716#comment-16223716
 ] 

Maximilian Roos commented on BEAM-3106:
---

> What do you think about a policy like, reviewing capped dependencies at every 
> release and ensuring that a) we are including latest versions of these known 
> dependencies, b) we are testing with those dependencies before a release.

I think that's a reasonable compromise, thanks Ahmet. 

To close this off, here's the system that works really well in the numerical 
python ecosystem (e.g. pandas / numpy / xarray):
- Run CI tests on a number of released versions of each dependency
- Any backward incompatible changes in dependencies are deprecated in advance, 
and tests catch those deprecation warnings - giving plenty of time for changes 
(this relies on all dependencies raising warnings for backward-incompatible 
changes)
- In the extensively maintained libraries, tests are also run on master branch 
of dependencies, to quickly flag any potential breakages



> Consider not pinning all python dependencies, or moving them to 
> requirements.txt
> 
>
> Key: BEAM-3106
> URL: https://issues.apache.org/jira/browse/BEAM-3106
> Project: Beam
>  Issue Type: Wish
>  Components: build-system
>Affects Versions: 2.1.0
> Environment: python
>Reporter: Maximilian Roos
>Assignee: Ahmet Altay
>
> Currently all python dependencies are [pinned or 
> capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97]
> While there's a good argument for supplying a `requirements.txt` with well 
> tested dependencies, having them specified in `setup.py` forces them to an 
> exact state on each install of Beam. This makes using Beam in any environment 
> with other libraries nigh on impossible. 
> This is particularly severe for the `gcp` dependencies, where we have 
> libraries that won't work with an older version (but Beam _does_ work with an 
> newer version). We have to do a bunch of gymnastics to get the correct 
> versions installed because of this. Unfortunately, airflow repeats this 
> practice and conflicts on a number of dependencies, adding further 
> complication (but, again there is no real conflict).
> I haven't seen this practice outside of the Apache & Google ecosystem - for 
> example no libraries in numerical python do this. Here's a [discussion on 
> SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt

2017-10-27 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222761#comment-16222761
 ] 

Ahmet Altay commented on BEAM-3106:
---

The advantage of putting them in {{setup.py}} is that the package contains the 
full dependency information and pip install just works, as opposed publishing 
separate versioned requirements file that needs to be installed.

I agree this is a tough problem, but I think the current solution is better 
than having a requirements file.

What do you think about a policy like, reviewing capped dependencies at every 
release and ensuring that a) we are including latest versions of these known 
dependencies, b) we are testing with those dependencies before a release.

> Consider not pinning all python dependencies, or moving them to 
> requirements.txt
> 
>
> Key: BEAM-3106
> URL: https://issues.apache.org/jira/browse/BEAM-3106
> Project: Beam
>  Issue Type: Wish
>  Components: build-system
>Affects Versions: 2.1.0
> Environment: python
>Reporter: Maximilian Roos
>Assignee: Ahmet Altay
>
> Currently all python dependencies are [pinned or 
> capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97]
> While there's a good argument for supplying a `requirements.txt` with well 
> tested dependencies, having them specified in `setup.py` forces them to an 
> exact state on each install of Beam. This makes using Beam in any environment 
> with other libraries nigh on impossible. 
> This is particularly severe for the `gcp` dependencies, where we have 
> libraries that won't work with an older version (but Beam _does_ work with an 
> newer version). We have to do a bunch of gymnastics to get the correct 
> versions installed because of this. Unfortunately, airflow repeats this 
> practice and conflicts on a number of dependencies, adding further 
> complication (but, again there is no real conflict).
> I haven't seen this practice outside of the Apache & Google ecosystem - for 
> example no libraries in numerical python do this. Here's a [discussion on 
> SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt

2017-10-27 Thread Maximilian Roos (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222698#comment-16222698
 ] 

Maximilian Roos commented on BEAM-3106:
---

Yes, those are good points, and this is a tough problem; particularly for Beam 
(or airflow) which relies on a lot of relatively fast-moving dependencies. 

That said, the consensus in the python community seems to be that 
`requirements.txt` is the place to put those pins, rather than `setup.py`, and 
that putting them in `setup.py` creates more interference than clarity.

Pinning to major versions could be a reasonable compromise. Though as an 
example, `google-cloud-bigquery` is pinned to a version 3 behind the latest, 
while it works (or at least I haven't had any issues) with the latest.

Thanks Ahmet.


> Consider not pinning all python dependencies, or moving them to 
> requirements.txt
> 
>
> Key: BEAM-3106
> URL: https://issues.apache.org/jira/browse/BEAM-3106
> Project: Beam
>  Issue Type: Wish
>  Components: build-system
>Affects Versions: 2.1.0
> Environment: python
>Reporter: Maximilian Roos
>Assignee: Ahmet Altay
>
> Currently all python dependencies are [pinned or 
> capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97]
> While there's a good argument for supplying a `requirements.txt` with well 
> tested dependencies, having them specified in `setup.py` forces them to an 
> exact state on each install of Beam. This makes using Beam in any environment 
> with other libraries nigh on impossible. 
> This is particularly severe for the `gcp` dependencies, where we have 
> libraries that won't work with an older version (but Beam _does_ work with an 
> newer version). We have to do a bunch of gymnastics to get the correct 
> versions installed because of this. Unfortunately, airflow repeats this 
> practice and conflicts on a number of dependencies, adding further 
> complication (but, again there is no real conflict).
> I haven't seen this practice outside of the Apache & Google ecosystem - for 
> example no libraries in numerical python do this. Here's a [discussion on 
> SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3106) Consider not pinning all python dependencies, or moving them to requirements.txt

2017-10-27 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222645#comment-16222645
 ] 

Ahmet Altay commented on BEAM-3106:
---

The reason behind pinning or capping dependencies is to prevent broken releases 
after time passes. In the past there were many occasions of a dependency 
releasing a backward-incompatible change and breaking and already release Beam 
version. This is bad for existing Beam users, because at some point in the 
future their currently working Beam release may stop and force them to do 
upgrades.

We follow the semantic versioning rules (capping) in general with dependencies. 
(For example: 'avro>=1.8.1,<2.0.0'). However, in the past, some dependencies, 
also released breaking changes without incrementing the major version. For 
those dependencies only, we pin their version to prevent any future breakages.

We can consider an alternative policy to what we are doing today, but it is 
important for us to ensure (as much as possible) that already released Beam 
versions will continue to work even after breaking change in a dependency. 


> Consider not pinning all python dependencies, or moving them to 
> requirements.txt
> 
>
> Key: BEAM-3106
> URL: https://issues.apache.org/jira/browse/BEAM-3106
> Project: Beam
>  Issue Type: Wish
>  Components: build-system
>Affects Versions: 2.1.0
> Environment: python
>Reporter: Maximilian Roos
>Assignee: Ahmet Altay
>
> Currently all python dependencies are [pinned or 
> capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97]
> While there's a good argument for supplying a `requirements.txt` with well 
> tested dependencies, having them specified in `setup.py` forces them to an 
> exact state on each install of Beam. This makes using Beam in any environment 
> with other libraries nigh on impossible. 
> This is particularly severe for the `gcp` dependencies, where we have 
> libraries that won't work with an older version (but Beam _does_ work with an 
> newer version). We have to do a bunch of gymnastics to get the correct 
> versions installed because of this. Unfortunately, airflow repeats this 
> practice and conflicts on a number of dependencies, adding further 
> complication (but, again there is no real conflict).
> I haven't seen this practice outside of the Apache & Google ecosystem - for 
> example no libraries in numerical python do this. Here's a [discussion on 
> SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)