very useful, thanks!
btw. to avoid calling the Create.of(rdd.collect()) - is there by any chance way
to get a pcollection directly from rdd?
thx,antony.
On Wednesday, 10 May 2017, 10:37, Jean-Baptiste Onofré <[email protected]>
wrote:
Hi Antony,
yes, it's possible to "inject"/reuse an existing Spark context via the pipeline
options. From the SparkPipelineOptions:
@Description("If the spark runner will be initialized with a provided Spark
Context. "
+ "The Spark Context should be provided with SparkContextOptions.")
@Default.Boolean(false)
boolean getUsesProvidedSparkContext();
void setUsesProvidedSparkContext(boolean value);
Regards
JB
On 05/10/2017 10:16 AM, Antony Mayi wrote:
> I've got a (dirty) usecase where I have existing spark batch job which
> produces
> an output that I would like to feed into my beam pipeline (assuming running on
> SparkRunner). I was trying to run it as one job (the output is reduced so not
> a
> big data hence ok to do something like Create.of(rdd.collect())) but that's
> failing because of the two separate spark contexts.
>
> Is it possible to build the beam pipeline on existing spark context?
>
> thx,
> Antony.
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com