makes sense, thanks for everything,a.
On Wednesday, 10 May 2017, 11:12, Jean-Baptiste Onofré <[email protected]>
wrote:
Hi Antony,
not directly from the Beam SDK: you have to be "in" the spark runner to do so,
adding your own PTransform and corresponding translator.
Else, it would mean we lost the "portability" of the pipeline to different
runners.
Regards
JB
On 05/10/2017 11:08 AM, Antony Mayi wrote:
> very useful, thanks!
>
> btw. to avoid calling the Create.of(rdd.collect()) - is there by any chance
> way
> to get a pcollection directly from rdd?
>
> thx,
> antony.
>
>
> On Wednesday, 10 May 2017, 10:37, Jean-Baptiste Onofré <[email protected]>
> wrote:
>
>
> Hi Antony,
>
> yes, it's possible to "inject"/reuse an existing Spark context via the
> pipeline
> options. From the SparkPipelineOptions:
>
> @Description("If the spark runner will be initialized with a provided Spark
> Context. "
> + "The Spark Context should be provided with SparkContextOptions.")
> @Default.Boolean(false)
> boolean getUsesProvidedSparkContext();
> void setUsesProvidedSparkContext(boolean value);
>
> Regards
> JB
>
> On 05/10/2017 10:16 AM, Antony Mayi wrote:
>> I've got a (dirty) usecase where I have existing spark batch job which
>> produces
>> an output that I would like to feed into my beam pipeline (assuming running
>> on
>> SparkRunner). I was trying to run it as one job (the output is reduced so
>> not a
>> big data hence ok to do something like Create.of(rdd.collect())) but that's
>> failing because of the two separate spark contexts.
>>
>> Is it possible to build the beam pipeline on existing spark context?
>>
>> thx,
>> Antony.
>
>
> --
> Jean-Baptiste Onofré
> [email protected] <mailto:[email protected]>
> http://blog.nanthrax.net <http://blog.nanthrax.net/>
> Talend - http://www.talend.com <http://www.talend.com/>
>
>
>
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com