Re: Beam support on Spark 2.x

Artur Mrozowski Mon, 13 Nov 2017 07:15:30 -0800

Ok, thank you.

On Mon, Nov 13, 2017 at 7:20 AM, Jean-Baptiste Onofré <[email protected]>
wrote:


> Hi Artur,
>
> I was talking about my "own" beam-samples that I'm using for the tests:
>
> https://github.com/jbonofre/beam-samples
>
> It's already possible to run on Spark 1.6 using Spark runner provided in
> Beam 2.1.0.
>
> For Spark 2.0, you will have to wait Spark runner that will be provided in
> Beam 2.3.0.
>
> Regards
> JB
>
> On 11/13/2017 06:50 AM, Artur Mrozowski wrote:
>
>> Hi Jean-Baptiste,
>> that's great news. When you mention beam-sample are you then referring to
>> gaming examples? https://github.com/eljefe6a/beamexample
>>
>> Those examples  cover a lot of what we try to achieve in our poc so it's
>> just great.  Should it possible to run these on both 1.6 and 2.0 versions
>> of Spark?
>>
>> We prefer 2.0 version of Spark.
>>
>> Best Regards
>> Artur
>>
>> On Fri, Nov 10, 2017 at 1:54 PM, Jean-Baptiste Onofré <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>     Hi,
>>
>>     I guess you are not following the dev mailing list.
>>
>>     Spark runner supports almost all transforms and yes, you can fully
>> use Spark
>>     runner to run your pipelines.
>>
>>     PCollection is represented with RDD and it's currently Spark 1.x.
>>
>>     I'm working on the Spark 2.x support (still using RDD): we have a
>> vote in
>>     progress on the mailing list if we want to support both Spark 1.x &
>> Spark
>>     2.x or just upgrade to Spark 2.x and drop support for Spark 1.x.
>>
>>     You can take a look on the beam-samples: they all run using the Spark
>> runner.
>>
>>     Regards
>>     JB
>>
>>
>>     On 11/10/2017 01:46 PM, Artur Mrozowski wrote:
>>
>>         Hi,
>>         I have seen the compatibility matrix and I realize that Spark is
>> not the
>>         most supported runner.
>>         I am curious if it is possible to run a pipeline on Spark, say
>> with
>>         global windows, after processing triggers and group by
>> key(CoGroupByKye,
>>         CombineByKey) . We have definitely problems to execute a pipeline
>> that
>>         successfully runs on direct runner.
>>
>>         Is that a known issue? Is Flink the best option?
>>
>>         Best Regards
>>         Artur
>>
>>
>>     --     Jean-Baptiste Onofré
>>     [email protected] <mailto:[email protected]>
>>     http://blog.nanthrax.net
>>     Talend - http://www.talend.com
>>
>>
>>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Beam support on Spark 2.x

Reply via email to