Hi Antony,

The Spark runner deals with caching/persist for you (analyzing how many time the same PCollection is used).

For the collect(), I don't fully understand your question.

If if you want to process elements in the PCollection, you can do simple ParDo:

.apply(ParDo.of(new DoFn() {
  @ProcessElement
  public void processElements(ProcessContext context) {
    T element = context.element();
    // do whatever you want
  }
})

Is it not what you want ?

Regards
JB

On 02/20/2017 10:30 AM, Antony Mayi wrote:
Hi,

what is the best way to fetch content of PCollection to local
memory/process (something like calling .collect() on Spark rdd)? Do I
need to implement custom Sink?

Thanks for any clues,
Antony.

--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to