Hi,
I guess you are not following the dev mailing list.
Spark runner supports almost all transforms and yes, you can fully use Spark
runner to run your pipelines.
PCollection is represented with RDD and it's currently Spark 1.x.
I'm working on the Spark 2.x support (still using RDD): we have a vote in
progress on the mailing list if we want to support both Spark 1.x & Spark 2.x or
just upgrade to Spark 2.x and drop support for Spark 1.x.
You can take a look on the beam-samples: they all run using the Spark runner.
Regards
JB
On 11/10/2017 01:46 PM, Artur Mrozowski wrote:
Hi,
I have seen the compatibility matrix and I realize that Spark is not the most
supported runner.
I am curious if it is possible to run a pipeline on Spark, say with global
windows, after processing triggers and group by key(CoGroupByKye, CombineByKey)
. We have definitely problems to execute a pipeline that successfully runs on
direct runner.
Is that a known issue? Is Flink the best option?
Best Regards
Artur
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com