Re: Providing HTTP client to DoFn

Lukasz Cwik Wed, 05 Jul 2017 14:59:03 -0700

That should have said:
~100s MiBs per window in streaming pipelines

On Wed, Jul 5, 2017 at 2:58 PM, Lukasz Cwik <[email protected]> wrote:


> #1, side inputs supported sizes and performance are specific to a runner.
> For example, I know that Dataflow supports side inputs which are 1+ TiB
> (aggregate) in batch pipelines and ~100s MiBs per window because there have
> been several one off benchmarks/runs. What kinds of sizes/use case do you
> want to support, some runners will do a much better job with really small
> side inputs while others will be better with really large side inputs?
>
> #2, this depends on which library your using to perform the REST calls and
> whether it is thread safe. DoFns can be shared across multiple bundles and
> can contain methods marked with @Setup/@Teardown which only get invoked
> once per DoFn instance (which is relatively infrequently) and you could
> store an instance per DoFn instead of a singleton if the REST library was
> not thread safe.
>
> On Wed, Jul 5, 2017 at 2:45 PM, Randal Moore <[email protected]> wrote:
>
>> I have a step in my beam pipeline that needs some data from a rest
>> service. The data acquired from the rest service is dependent on the
>> context of the data being processed and relatively large. The rest client I
>> am using isn't serializable - nor is it likely possible to make it so
>> (background threads, etc.).
>>
>> #1 What are the practical limits to the size of side inputs (e.g., I
>> could try to gather all the data from the rest service and provide it as a
>> side-input)?
>>
>> #2 Assuming that using the rest client is the better option, would a
>> singleton instance be safe way to instantiate the rest client?
>>
>> Thanks,
>> rdm
>>
>
>

Re: Providing HTTP client to DoFn

Reply via email to