Hi Artur,
I was talking about my "own" beam-samples that I'm using for the tests:
https://github.com/jbonofre/beam-samples
It's already possible to run on Spark 1.6 using Spark runner provided in Beam
2.1.0.
For Spark 2.0, you will have to wait Spark runner that will be provided in Beam
2.3.0.
Regards
JB
On 11/13/2017 06:50 AM, Artur Mrozowski wrote:
Hi Jean-Baptiste,
that's great news. When you mention beam-sample are you then referring to gaming
examples? https://github.com/eljefe6a/beamexample
Those examples cover a lot of what we try to achieve in our poc so it's just
great. Should it possible to run these on both 1.6 and 2.0 versions of Spark?
We prefer 2.0 version of Spark.
Best Regards
Artur
On Fri, Nov 10, 2017 at 1:54 PM, Jean-Baptiste Onofré <j...@nanthrax.net
<mailto:j...@nanthrax.net>> wrote:
Hi,
I guess you are not following the dev mailing list.
Spark runner supports almost all transforms and yes, you can fully use Spark
runner to run your pipelines.
PCollection is represented with RDD and it's currently Spark 1.x.
I'm working on the Spark 2.x support (still using RDD): we have a vote in
progress on the mailing list if we want to support both Spark 1.x & Spark
2.x or just upgrade to Spark 2.x and drop support for Spark 1.x.
You can take a look on the beam-samples: they all run using the Spark
runner.
Regards
JB
On 11/10/2017 01:46 PM, Artur Mrozowski wrote:
Hi,
I have seen the compatibility matrix and I realize that Spark is not the
most supported runner.
I am curious if it is possible to run a pipeline on Spark, say with
global windows, after processing triggers and group by key(CoGroupByKye,
CombineByKey) . We have definitely problems to execute a pipeline that
successfully runs on direct runner.
Is that a known issue? Is Flink the best option?
Best Regards
Artur
--
Jean-Baptiste Onofré
jbono...@apache.org <mailto:jbono...@apache.org>
http://blog.nanthrax.net
Talend - http://www.talend.com
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com