Different runners have different characteristics, pros, and cons,
which is part of the value proposition for Beam. We have some
comparisons at https://beam.apache.org/documentation/runners/capability-matrix/
but these were put together a while ago and don't really take into
account the state of thi
Probably the most stable is running on Dataflow still. But I’m excited to see
the progress towards a Spark runner, can’t wait to try TFT on it :)
That is debatable. It is also hard to compare because Dataflow is a
managed service, whereas you'll have to spin up your own cluster for
other Runn
Google Dataflow currently uses a JSON representation of the pipeline graph
and also the pipeline proto. We represent the graph in two different ways
which leads to some wonderful *features*.
Google Dataflow also side steps the Beam job service since Dataflow has its
own Job API. Supporting the Beam
Just note that while Dataflow does have robust python support it does not
fully support the portability framework. It’s a bit of a blurry
distinction, and honestly I’m not crystal clear on this as I get the
impression that Dataflow may be a bit of a Portability hybrid. It does not
use the job ser
Probably the most stable is running on Dataflow still. But I’m excited to
see the progress towards a Spark runner, can’t wait to try TFT on it :)
On Tue, Sep 17, 2019 at 4:37 PM Kyle Weaver wrote:
> The Flink runner is definitely more stable, as it's been around for longer
> and has more develop
The Flink runner is definitely more stable, as it's been around for longer
and has more developers and users on it. But a lot of the code is shared,
so for example some of the issues above would also happen on the Flink
runner.
Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.c
Thanks for all the replies Kyle! You've been super helpful :D.
Would you say that the Flink runner more stable than the Spark one? Or which
combo is the most stable for now?
On 2019/09/17 19:43:54, Tom Barber wrote:
> Thanks Kyle,
>
> From my pov Alpha is fine, I’m just trying to test out som
Thanks Kyle,
>From my pov Alpha is fine, I’m just trying to test out some of the
capabilities currently, but trying to dig around the website doesn’t
explain a great deal. Luckily Benjamin seems a step ahead of me… I hope it
stays that way! ;)
On 17 September 2019 at 19:33:40, Kyle Weaver (kcwe
> The amount of issues I've encountered as a newbie is indeed troubling.
Spark portability is very much "alpha" quality software, a point we should
maybe emphasize on the website more. Anyway, I appreciate your patience,
and I'll do my best to address all these issues.
> org.apache.beam.vendor.grp
:D. Still, I'm curious as to the error we both are getting. Maybe someone
can shed some light on it.
On Tue, Sep 17, 2019 at 10:54 PM Tom Barber wrote:
> I do see hello written to 1 file and world to another, I guess it works!
> Thanks for the pointers Benjamin I was about to give up.
>
> Tom
>
I do see hello written to 1 file and world to another, I guess it works!
Thanks for the pointers Benjamin I was about to give up.
Tom
On 17 September 2019 at 15:51:13, Benjamin Tan (benjamintanwei...@gmail.com)
wrote:
Tell me if you see any output. Anyway, here's the link to the same issue
you'
Tell me if you see any output. Anyway, here's the link to the same issue you're
facing:
https://lists.apache.org/thread.html/4e8e1455916debe096de32551f9ab05853524cf282bc312cd4620d68@%3Cuser.beam.apache.org%3E
The amount of issues I've encountered as a newbie is indeed troubling.
On 2019/09/17
🤣 okay I’ll look again, I assumed it just crashed in a ball of flames!
On 17 September 2019 at 15:39:33, Benjamin Tan (benjamintanwei...@gmail.com)
wrote:
I got this too! Did you manage to get any output? (I did) I reported this
in another thread.
This looks like a key error when StopWorker is
Well my errors are different but still terminal:
ERROR:grpc._server:Exception calling application: u'1-1'
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/grpc/_server.py", line 434,
in _call_behavior
response_or_iterator = behavior(argument, context)
File
"/
If it helps, I’m using Spark 2.4.4. The Apache Beam Python library on master is
2.17.0-dev.
> On 17 Sep 2019, at 9:39 PM, Tom Barber wrote:
>
> Cool thanks Benjamin, I’ll give it a shot.
>
> Tom
>
>
>> On 17 September 2019 at 13:56:14, Benjamin Tan (benjamintanwei...@gmail.com)
>> wrote:
Cool thanks Benjamin, I’ll give it a shot.
Tom
On 17 September 2019 at 13:56:14, Benjamin Tan (benjamintanwei...@gmail.com)
wrote:
I encountered the exact same thing today. High five! Here’s how I managed
to make some progress:
1. Used the master branch
2. Built and installed the Python SDK
I encountered the exact same thing today. High five! Here’s how I managed to
make some progress:
1. Used the master branch
2. Built and installed the Python SDK
cd into the sdk library and python ./setup install
I got some other errors but they didn’t seem to be show stoppers.
> On 17 Sep 2
Hello folks,
Day 3 of trying to get the basics going with Python & Spark 2.2.3.
I’ve downgraded the spark version to 2.2.3 in the cradle build so that I
can run jobs against it.
I’ve then written this:
options = PipelineOptions(["--runner=PortableRunner",
"--job_endpoint=localhost:8099", "--env
18 matches
Mail list logo