Re: Python staging file weirdness

2019-12-05 Thread Valentyn Tymofieiev
Filed https://issues.apache.org/jira/browse/BEAM-8900 to address the inefficiency discussed here. Thanks everyone. On Thu, Dec 5, 2019 at 2:53 PM Valentyn Tymofieiev wrote: > Note that so far we have not been staging wheels, since SDK does not have > a knowledge of a target platform, but there

Re: Python staging file weirdness

2019-12-05 Thread Valentyn Tymofieiev
Note that so far we have not been staging wheels, since SDK does not have a knowledge of a target platform, but there is https://issues.apache.org/jira/browse/BEAM-4032 to add this support. On Thu, Dec 5, 2019 at 2:35 PM Chad Dombrova wrote: > On Thu, Dec 5, 2019 at 12:36 PM Valentyn Tymofieiev

Re: Python staging file weirdness

2019-12-05 Thread Chad Dombrova
On Thu, Dec 5, 2019 at 12:36 PM Valentyn Tymofieiev wrote: Ah nice, so then the workflow would be: download [missing] deps from pypi > into a long-lived cache directory, then download copy the same deps into > a short-lived temporary directory, using long-lived cache directory as > SoT, then

Re: Python staging file weirdness

2019-12-05 Thread Valentyn Tymofieiev
Ah nice, so then the workflow would be: download [missing] deps from pypi into a long-lived cache directory, then download copy the same deps into a short-lived temporary directory, using long-lived cache directory as SoT, then stage files from a short-lived temporary directory and clean it up.

Re: Python staging file weirdness

2019-12-05 Thread Chad Dombrova
Another way to copy only the deps you care about is to use `pip download` to do the copy. I believe you can provide the cache dir to `pip download --find-links` and it will read from that before reading from pypi (you may also need to set --wheel-dir to the cache dir as well), and thus it acts as

Re: Python staging file weirdness

2019-12-05 Thread Valentyn Tymofieiev
Looked for a bit at pip download command. The alternative seems to parse the output of python -m pip download --dest . -r requirements.txt --exists-action i --no-binary :all: and see which files were downloaded and/or skipped since they were already present, and then stage only the files that

Re: Python staging file weirdness

2019-12-05 Thread Luke Cwik
I think reusing the same cache directory makes sense during downloading but why do we upload everything that is there? On Thu, Dec 5, 2019 at 9:24 AM Udi Meiri wrote: > Looking at the source, it seems that it should be using a > os.path.join(tempfile.gettempdir(), 'dataflow-requirements-cache')

Re: Python staging file weirdness

2019-12-04 Thread Luke Cwik
Can we filter the cache directory only for the artifacts that we want and not everything that is there? On Wed, Dec 4, 2019 at 6:56 PM Valentyn Tymofieiev wrote: > Luke, I am not sure I understand the question. The caching that happens > here is implemented in the SDK for requirements packages:

Re: Python staging file weirdness

2019-12-04 Thread Valentyn Tymofieiev
Luke, I am not sure I understand the question. The caching that happens here is implemented in the SDK for requirements packages: https://github.com/apache/beam/blob/438055c95116f4e6e419e5faa9c42f7d329c421c/sdks/python/apache_beam/runners/portability/stager.py#L161 On Wed, Dec 4, 2019 at 6:19 PM

Re: Python staging file weirdness

2019-12-04 Thread Luke Cwik
Is there a way to use a cache on disk that is separate from the set of packages we use as requirements? On Wed, Dec 4, 2019 at 5:58 PM Udi Meiri wrote: > Thanks! > Another reason to periodically referesh workers. > > On Wed, Nov 27, 2019 at 10:37 PM Valentyn Tymofieiev > wrote: > >> Tests job

Re: Python staging file weirdness

2019-12-04 Thread Udi Meiri
Thanks! Another reason to periodically referesh workers. On Wed, Nov 27, 2019 at 10:37 PM Valentyn Tymofieiev wrote: > Tests job specify[1] a requirements.txt file that contains two entries: > pyhamcrest, mock. > > We download[2] sources of packages specified in requirements file, > and

Re: Python staging file weirdness

2019-11-27 Thread Valentyn Tymofieiev
Tests job specify[1] a requirements.txt file that contains two entries: pyhamcrest, mock. We download[2] sources of packages specified in requirements file, and packages they depend on. While doing so, it appears that we use a cache directory on jenkins to store the sources of the packages [3],

Python staging file weirdness

2019-11-27 Thread Udi Meiri
I was investigating a Dataflow postcommit test failure (endpoints_pb2 missing), and saw this in the staging directory: $ gsutil ls gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882