Re: Best way to inject external per key config into the pipeline

Yihua Fang Thu, 05 Oct 2017 10:17:45 -0700

I thought about streaming the updates and using side input, but since the
config data are not persisted in the data pipeline, how would the config
being populated in the first place. Is the solution to populate through
streaming every time the pipeline is rebooted?

1. The config data can be a few minutes to a few hours late. No need to
immediately reflects the config.
2. The config data shouldn't be change often. It is configured by human
users.
3. The config data per key should be about 10-20 key value pairs.
4. Ideally the key number is in the range of a few millions, but a few
thousands to begin with.

Thanks
Eric

On Thu, Oct 5, 2017 at 9:09 AM Lukasz Cwik <[email protected]> wrote:

> Can you stream the updates to the keys into the pipeline and then use it
> as a side input performing a join against on your main stream that needs
> the config data?
> You could also use an in memory cache that periodically refreshes keys
> from the external source.
>
> A better answer depends on:
> * how stale can the config data be?
> * how often does the config data change?
> * how much config data you expect to have per key?
> * how many keys do you expect to have?
>
>
> On Wed, Oct 4, 2017 at 5:41 PM, Yihua Fang <[email protected]>
> wrote:
>
>> Hi,
>>
>> The use case is that I have an external source that store a configuration
>> for each key accessible via restful APIs and Beam pipeline should use the
>> config to process each element for each key. What is the best approach to
>> facilitate injecting the latest config into the pipeline?
>>
>> Thanks
>> Eric
>>
>
>

Re: Best way to inject external per key config into the pipeline

Reply via email to