Re: ParDo(DoFn) with multiple context.output vs FlatMapElements

2024-01-26 Thread Robert Bradshaw via user
There is no difference; FlatMapElements is implemented in terms of a
DoFn that invokes context.output multiple times. And, yes, Dataflow
will fuse consecutive operations automatically. So if you have
something like

... -> DoFnA -> DoFnB -> GBK -> DoFnC -> ...

Dataflow will fuse DoFnA and DoFnB together, and if DoFnA produces a
lot of data for DoFnB to consume then more workers will be allocated
to handle the (DoFnA + DoFnB) combination. If the fanout is so huge
that a single worker would not be expected to handle the output DoFnA
produces from a single input, you could look into making DoFnA into a
SplittableDoFn https://beam.apache.org/blog/splittable-do-fn-is-available
. If DoFnB is just really expensive, you can also decouple the
parallelism between the two with a Reshuffle. Most of the time neither
of these is needed.

On Wed, Dec 27, 2023 at 5:44 PM hsy...@gmail.com  wrote:
>
> Hello
>
> I have a question. If I have a transform for each input it will emit 1 or 
> many output (same collection)
> I can do it with ParDo + DoFun while in processElement method for each input, 
> call context.output multiply times vs doing it with FlatMapElements, is there 
> any difference? Does the dataflow fuse the downstream transform 
> automatically? Eventually I want more downstream transform workers cause it 
> needs to handle more data, How do I supposed to do that?
>
> Regards,
> Siyuan


ParDo(DoFn) with multiple context.output vs FlatMapElements

2023-12-27 Thread hsy...@gmail.com
Hello

I have a question. If I have a transform for each input it will emit 1 or
many output (same collection)
I can do it with ParDo + DoFun while in processElement method for each
input, call context.output multiply times vs doing it with FlatMapElements,
is there any difference? Does the dataflow fuse the downstream transform
automatically? Eventually I want more downstream transform workers cause it
needs to handle more data, How do I supposed to do that?

Regards,
Siyuan