#1, side inputs supported sizes and performance are specific to a runner.
For example, I know that Dataflow supports side inputs which are 1+ TiB
(aggregate) in batch pipelines and ~100s MiBs per window because there have
been several one off benchmarks/runs. What kinds of sizes/use case do you
want to support, some runners will do a much better job with really small
side inputs while others will be better with really large side inputs?

#2, this depends on which library your using to perform the REST calls and
whether it is thread safe. DoFns can be shared across multiple bundles and
can contain methods marked with @Setup/@Teardown which only get invoked
once per DoFn instance (which is relatively infrequently) and you could
store an instance per DoFn instead of a singleton if the REST library was
not thread safe.

On Wed, Jul 5, 2017 at 2:45 PM, Randal Moore <[email protected]> wrote:

> I have a step in my beam pipeline that needs some data from a rest
> service. The data acquired from the rest service is dependent on the
> context of the data being processed and relatively large. The rest client I
> am using isn't serializable - nor is it likely possible to make it so
> (background threads, etc.).
>
> #1 What are the practical limits to the size of side inputs (e.g., I could
> try to gather all the data from the rest service and provide it as a
> side-input)?
>
> #2 Assuming that using the rest client is the better option, would a
> singleton instance be safe way to instantiate the rest client?
>
> Thanks,
> rdm
>

Reply via email to