Re: Python Portable Runner Issues

2019-10-01 Thread Robert Bradshaw
Different runners have different characteristics, pros, and cons, which is part of the value proposition for Beam. We have some comparisons at https://beam.apache.org/documentation/runners/capability-matrix/ but these were put together a while ago and don't really take into account the state of thi

Re: Python Portable Runner Issues

2019-10-01 Thread Maximilian Michels
Probably the most stable is running on Dataflow still. But I’m excited to see the progress towards a Spark runner, can’t wait to try TFT on it :) That is debatable. It is also hard to compare because Dataflow is a managed service, whereas you'll have to spin up your own cluster for other Runn

Re: Python Portable Runner Issues

2019-09-25 Thread Lukasz Cwik
Google Dataflow currently uses a JSON representation of the pipeline graph and also the pipeline proto. We represent the graph in two different ways which leads to some wonderful *features*. Google Dataflow also side steps the Beam job service since Dataflow has its own Job API. Supporting the Beam

Re: Python Portable Runner Issues

2019-09-18 Thread Chad Dombrova
Just note that while Dataflow does have robust python support it does not fully support the portability framework. It’s a bit of a blurry distinction, and honestly I’m not crystal clear on this as I get the impression that Dataflow may be a bit of a Portability hybrid. It does not use the job ser

Re: Python Portable Runner Issues

2019-09-18 Thread Holden Karau
Probably the most stable is running on Dataflow still. But I’m excited to see the progress towards a Spark runner, can’t wait to try TFT on it :) On Tue, Sep 17, 2019 at 4:37 PM Kyle Weaver wrote: > The Flink runner is definitely more stable, as it's been around for longer > and has more develop

Re: Python Portable Runner Issues

2019-09-17 Thread Kyle Weaver
The Flink runner is definitely more stable, as it's been around for longer and has more developers and users on it. But a lot of the code is shared, so for example some of the issues above would also happen on the Flink runner. Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.c

Re: Python Portable Runner Issues

2019-09-17 Thread Benjamin Tan
Thanks for all the replies Kyle! You've been super helpful :D. Would you say that the Flink runner more stable than the Spark one? Or which combo is the most stable for now? On 2019/09/17 19:43:54, Tom Barber wrote: > Thanks Kyle, > > From my pov Alpha is fine, I’m just trying to test out som

Re: Python Portable Runner Issues

2019-09-17 Thread Tom Barber
Thanks Kyle, >From my pov Alpha is fine, I’m just trying to test out some of the capabilities currently, but trying to dig around the website doesn’t explain a great deal. Luckily Benjamin seems a step ahead of me… I hope it stays that way! ;) On 17 September 2019 at 19:33:40, Kyle Weaver (kcwe

Re: Python Portable Runner Issues

2019-09-17 Thread Kyle Weaver
> The amount of issues I've encountered as a newbie is indeed troubling. Spark portability is very much "alpha" quality software, a point we should maybe emphasize on the website more. Anyway, I appreciate your patience, and I'll do my best to address all these issues. > org.apache.beam.vendor.grp

Re: Python Portable Runner Issues

2019-09-17 Thread Benjamin Tan
:D. Still, I'm curious as to the error we both are getting. Maybe someone can shed some light on it. On Tue, Sep 17, 2019 at 10:54 PM Tom Barber wrote: > I do see hello written to 1 file and world to another, I guess it works! > Thanks for the pointers Benjamin I was about to give up. > > Tom >

Re: Python Portable Runner Issues

2019-09-17 Thread Tom Barber
I do see hello written to 1 file and world to another, I guess it works! Thanks for the pointers Benjamin I was about to give up. Tom On 17 September 2019 at 15:51:13, Benjamin Tan (benjamintanwei...@gmail.com) wrote: Tell me if you see any output. Anyway, here's the link to the same issue you'

Re: Python Portable Runner Issues

2019-09-17 Thread Benjamin Tan
Tell me if you see any output. Anyway, here's the link to the same issue you're facing: https://lists.apache.org/thread.html/4e8e1455916debe096de32551f9ab05853524cf282bc312cd4620d68@%3Cuser.beam.apache.org%3E The amount of issues I've encountered as a newbie is indeed troubling. On 2019/09/17

Re: Python Portable Runner Issues

2019-09-17 Thread Tom Barber
🤣 okay I’ll look again, I assumed it just crashed in a ball of flames! On 17 September 2019 at 15:39:33, Benjamin Tan (benjamintanwei...@gmail.com) wrote: I got this too! Did you manage to get any output? (I did) I reported this in another thread. This looks like a key error when StopWorker is

Re: Python Portable Runner Issues

2019-09-17 Thread Tom Barber
Well my errors are different but still terminal: ERROR:grpc._server:Exception calling application: u'1-1' Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/grpc/_server.py", line 434, in _call_behavior response_or_iterator = behavior(argument, context) File "/

Re: Python Portable Runner Issues

2019-09-17 Thread Benjamin Tan
If it helps, I’m using Spark 2.4.4. The Apache Beam Python library on master is 2.17.0-dev. > On 17 Sep 2019, at 9:39 PM, Tom Barber wrote: > > Cool thanks Benjamin, I’ll give it a shot. > > Tom > > >> On 17 September 2019 at 13:56:14, Benjamin Tan (benjamintanwei...@gmail.com) >> wrote:

Re: Python Portable Runner Issues

2019-09-17 Thread Tom Barber
Cool thanks Benjamin, I’ll give it a shot. Tom On 17 September 2019 at 13:56:14, Benjamin Tan (benjamintanwei...@gmail.com) wrote: I encountered the exact same thing today. High five! Here’s how I managed to make some progress: 1. Used the master branch 2. Built and installed the Python SDK

Re: Python Portable Runner Issues

2019-09-17 Thread Benjamin Tan
I encountered the exact same thing today. High five! Here’s how I managed to make some progress: 1. Used the master branch 2. Built and installed the Python SDK cd into the sdk library and python ./setup install I got some other errors but they didn’t seem to be show stoppers. > On 17 Sep 2

Python Portable Runner Issues

2019-09-17 Thread Tom Barber
Hello folks, Day 3 of trying to get the basics going with Python & Spark 2.2.3. I’ve downgraded the spark version to 2.2.3 in the cradle build so that I can run jobs against it. I’ve then written this: options = PipelineOptions(["--runner=PortableRunner", "--job_endpoint=localhost:8099", "--env