Hi Antony,
The Spark runner deals with caching/persist for you (analyzing how many
time the same PCollection is used).
For the collect(), I don't fully understand your question.
If if you want to process elements in the PCollection, you can do simple
ParDo:
.apply(ParDo.of(new DoFn() {
@ProcessElement
public void processElements(ProcessContext context) {
T element = context.element();
// do whatever you want
}
})
Is it not what you want ?
Regards
JB
On 02/20/2017 10:30 AM, Antony Mayi wrote:
Hi,
what is the best way to fetch content of PCollection to local
memory/process (something like calling .collect() on Spark rdd)? Do I
need to implement custom Sink?
Thanks for any clues,
Antony.
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com