That should have said: ~100s MiBs per window in streaming pipelines On Wed, Jul 5, 2017 at 2:58 PM, Lukasz Cwik <[email protected]> wrote:
> #1, side inputs supported sizes and performance are specific to a runner. > For example, I know that Dataflow supports side inputs which are 1+ TiB > (aggregate) in batch pipelines and ~100s MiBs per window because there have > been several one off benchmarks/runs. What kinds of sizes/use case do you > want to support, some runners will do a much better job with really small > side inputs while others will be better with really large side inputs? > > #2, this depends on which library your using to perform the REST calls and > whether it is thread safe. DoFns can be shared across multiple bundles and > can contain methods marked with @Setup/@Teardown which only get invoked > once per DoFn instance (which is relatively infrequently) and you could > store an instance per DoFn instead of a singleton if the REST library was > not thread safe. > > On Wed, Jul 5, 2017 at 2:45 PM, Randal Moore <[email protected]> wrote: > >> I have a step in my beam pipeline that needs some data from a rest >> service. The data acquired from the rest service is dependent on the >> context of the data being processed and relatively large. The rest client I >> am using isn't serializable - nor is it likely possible to make it so >> (background threads, etc.). >> >> #1 What are the practical limits to the size of side inputs (e.g., I >> could try to gather all the data from the rest service and provide it as a >> side-input)? >> >> #2 Assuming that using the rest client is the better option, would a >> singleton instance be safe way to instantiate the rest client? >> >> Thanks, >> rdm >> > >
