Re: How do you write portable runner pipeline on separate python code ?

2019-09-14 Thread Yu Watanabe
Lukasz Thank you for the reply. > * Using a "remote" filesystem such as HDFS/S3/GCS/... > * Mounting an external directory into the container so that any "local" writes appear outside the container > * Using a non-docker environment such as external or process. Understood. Thanks, Yu

Re: How do you write portable runner pipeline on separate python code ?

2019-09-14 Thread Yu Watanabe
Kyle Thank you for the advice. > For example, Yu's pipeline errored here because the expected Docker container wasn't built before running. I was able to spin up the harness container and submit job to the job service by preparing the container properly. I needed to do extra steps in the

Re: How do you write portable runner pipeline on separate python code ?

2019-09-13 Thread Robert Bradshaw
Note that loopback won't fix the problem for, say, cross-language IOs. But, yes, it's really handy and should probably be used more. On Fri, Sep 13, 2019 at 8:29 AM Lukasz Cwik wrote: > And/or update the wiki/website with some how to's... > > On Fri, Sep 13, 2019 at 7:51 AM Thomas Weise wrote:

Re: How do you write portable runner pipeline on separate python code ?

2019-09-13 Thread Lukasz Cwik
And/or update the wiki/website with some how to's... On Fri, Sep 13, 2019 at 7:51 AM Thomas Weise wrote: > I agree that loopback would be preferable for this purpose. I just wasn't > aware this even works with the portable Flink runner. Is it one of the best > guarded secrets? ;-) > > Kyle, can

Re: How do you write portable runner pipeline on separate python code ?

2019-09-13 Thread Thomas Weise
I agree that loopback would be preferable for this purpose. I just wasn't aware this even works with the portable Flink runner. Is it one of the best guarded secrets? ;-) Kyle, can you please post the pipeline options you would use for Flink? On Thu, Sep 12, 2019 at 5:57 PM Kyle Weaver wrote:

Re: How do you write portable runner pipeline on separate python code ?

2019-09-13 Thread Fred Tsang
and manually run the portable runner? Of course, the correct way should be docker, but it would be very handy to have this option. Cheers Fred From: Kyle Weaver [mailto:kcwea...@google.com] Sent: 13 September 2019 01:57 To: user@beam.apache.org Cc: dev Subject: EXTERNAL: Re: How do you write portable

Re: How do you write portable runner pipeline on separate python code ?

2019-09-12 Thread Kyle Weaver
I prefer loopback because a) it writes output files to the local filesystem, as the user expects, and b) you don't have to pull or build docker images, or even have docker installed on your system -- which is one less point of failure. Kyle Weaver | Software Engineer | github.com/ibzib |

Re: How do you write portable runner pipeline on separate python code ?

2019-09-12 Thread Thomas Weise
This should become much better with 2.16 when we have the Docker images prebuilt. Docker is probably still the best option for Python on a JVM based runner in a local environment that does not have a development setup. On Thu, Sep 12, 2019 at 1:09 PM Kyle Weaver wrote: > +dev I think we

Re: How do you write portable runner pipeline on separate python code ?

2019-09-12 Thread Kyle Weaver
+dev I think we should probably point new users of the portable Flink/Spark runners to use loopback or some other non-docker environment, as Docker adds some operational complexity that isn't really needed to run a word count example. For example, Yu's pipeline errored here because the expected

Re: How do you write portable runner pipeline on separate python code ?

2019-09-12 Thread Lukasz Cwik
When you use a local filesystem path and a docker environment, "/tmp" is written inside the container. You can solve this issue by: * Using a "remote" filesystem such as HDFS/S3/GCS/... * Mounting an external directory into the container so that any "local" writes appear outside the container *