Agree with Amit. Not sure it would be portable to provide such function.
Regards JB On 02/20/2017 11:04 AM, Amit Sela wrote:
Spark runner's EvaluationContext <https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/EvaluationContext.java#L201> has a hook ready for this - but clearly only for batch, in streaming this feature doesn't seem relevant. You can easily hack this in the Spark runner, but for Beam in general I wonder how this would work in a runner-agnostic way ? Spark has a driver process, not sure how this works for other runners. On Mon, Feb 20, 2017 at 11:54 AM Antony Mayi <[email protected] <mailto:[email protected]>> wrote: Thanks Jean, my point is to retrieve the data represented let say as PCollection<String> to a List<String> (not PCollection<List<String>>) - essentially fetching it all to address space of the local driver process (this is what Spark's .collect() does). It would be a reverse of the beam.sdks.transforms.Create - which takes a local iterable and distributes it into PCollection - at the end of the pipeline I would like to get the result back to single iterable (hence I assuming it would need some type of Sink). Thanks, Antony. On Monday, 20 February 2017, 10:40, Jean-Baptiste Onofré <[email protected] <mailto:[email protected]>> wrote: Hi Antony, The Spark runner deals with caching/persist for you (analyzing how many time the same PCollection is used). For the collect(), I don't fully understand your question. If if you want to process elements in the PCollection, you can do simple ParDo: .apply(ParDo.of(new DoFn() { @ProcessElement public void processElements(ProcessContext context) { T element = context.element(); // do whatever you want } }) Is it not what you want ? Regards JB On 02/20/2017 10:30 AM, Antony Mayi wrote: > Hi, > > what is the best way to fetch content of PCollection to local > memory/process (something like calling .collect() on Spark rdd)? Do I > need to implement custom Sink? > > Thanks for any clues, > Antony. -- Jean-Baptiste Onofré [email protected] <mailto:[email protected]> http://blog.nanthrax.net <http://blog.nanthrax.net/> Talend - http://www.talend.com <http://www.talend.com/>
-- Jean-Baptiste Onofré [email protected] http://blog.nanthrax.net Talend - http://www.talend.com
