Re: appending beam pipeline to spark job

Jean-Baptiste Onofré Wed, 10 May 2017 02:13:14 -0700

Hi Antony,

not directly from the Beam SDK: you have to be "in" the spark runner to do so,adding your own PTransform and corresponding translator.


Else, it would mean we lost the "portability" of the pipeline to different 
runners.

Regards
JB

On 05/10/2017 11:08 AM, Antony Mayi wrote:

very useful, thanks!

btw. to avoid calling the Create.of(rdd.collect()) - is there by any chance way
to get a pcollection directly from rdd?

thx,
antony.


On Wednesday, 10 May 2017, 10:37, Jean-Baptiste Onofré <[email protected]> 
wrote:


Hi Antony,

yes, it's possible to "inject"/reuse an existing Spark context via the pipeline
options. From the SparkPipelineOptions:

  @Description("If the spark runner will be initialized with a provided Spark
Context. "
      + "The Spark Context should be provided with SparkContextOptions.")
  @Default.Boolean(false)
  boolean getUsesProvidedSparkContext();
  void setUsesProvidedSparkContext(boolean value);

Regards
JB

On 05/10/2017 10:16 AM, Antony Mayi wrote:

I've got a (dirty) usecase where I have existing spark batch job which produces
an output that I would like to feed into my beam pipeline (assuming running on
SparkRunner). I was trying to run it as one job (the output is reduced so not a
big data hence ok to do something like Create.of(rdd.collect())) but that's
failing because of the two separate spark contexts.

Is it possible to build the beam pipeline on existing spark context?

thx,
Antony.



--
Jean-Baptiste Onofré
[email protected] <mailto:[email protected]>
http://blog.nanthrax.net <http://blog.nanthrax.net/>
Talend - http://www.talend.com <http://www.talend.com/>


--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: appending beam pipeline to spark job

Reply via email to