Re: appending beam pipeline to spark job

Antony Mayi Wed, 10 May 2017 02:09:07 -0700

very useful, thanks!
btw. to avoid calling the Create.of(rdd.collect()) - is there by any chance way 
to get a pcollection directly from rdd?
thx,antony.


    On Wednesday, 10 May 2017, 10:37, Jean-Baptiste Onofré <[email protected]> 
wrote:
 

 Hi Antony,

yes, it's possible to "inject"/reuse an existing Spark context via the pipeline 
options. From the SparkPipelineOptions:

  @Description("If the spark runner will be initialized with a provided Spark 
Context. "
      + "The Spark Context should be provided with SparkContextOptions.")
  @Default.Boolean(false)
  boolean getUsesProvidedSparkContext();
  void setUsesProvidedSparkContext(boolean value);

Regards
JB

On 05/10/2017 10:16 AM, Antony Mayi wrote:
> I've got a (dirty) usecase where I have existing spark batch job which 
> produces
> an output that I would like to feed into my beam pipeline (assuming running on
> SparkRunner). I was trying to run it as one job (the output is reduced so not 
> a
> big data hence ok to do something like Create.of(rdd.collect())) but that's
> failing because of the two separate spark contexts.
>
> Is it possible to build the beam pipeline on existing spark context?
>
> thx,
> Antony.

-- 
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: appending beam pipeline to spark job

Reply via email to