Spark runner's EvaluationContext <https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/EvaluationContext.java#L201> has a hook ready for this - but clearly only for batch, in streaming this feature doesn't seem relevant.
You can easily hack this in the Spark runner, but for Beam in general I wonder how this would work in a runner-agnostic way ? Spark has a driver process, not sure how this works for other runners. On Mon, Feb 20, 2017 at 11:54 AM Antony Mayi <[email protected]> wrote: > Thanks Jean, > > my point is to retrieve the data represented let say as > PCollection<String> to a List<String> (not PCollection<List<String>>) - > essentially fetching it all to address space of the local driver process > (this is what Spark's .collect() does). It would be a reverse of the > beam.sdks.transforms.Create - which takes a local iterable and distributes > it into PCollection - at the end of the pipeline I would like to get the > result back to single iterable (hence I assuming it would need some type of > Sink). > > Thanks, > Antony. > > > On Monday, 20 February 2017, 10:40, Jean-Baptiste Onofré <[email protected]> > wrote: > > > Hi Antony, > > The Spark runner deals with caching/persist for you (analyzing how many > time the same PCollection is used). > > For the collect(), I don't fully understand your question. > > If if you want to process elements in the PCollection, you can do simple > ParDo: > > .apply(ParDo.of(new DoFn() { > @ProcessElement > public void processElements(ProcessContext context) { > T element = context.element(); > // do whatever you want > } > }) > > Is it not what you want ? > > Regards > JB > > On 02/20/2017 10:30 AM, Antony Mayi wrote: > > Hi, > > > > what is the best way to fetch content of PCollection to local > > memory/process (something like calling .collect() on Spark rdd)? Do I > > need to implement custom Sink? > > > > Thanks for any clues, > > Antony. > > > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com > > > >
