I thought about streaming the updates and using side input, but since the config data are not persisted in the data pipeline, how would the config being populated in the first place. Is the solution to populate through streaming every time the pipeline is rebooted?
1. The config data can be a few minutes to a few hours late. No need to immediately reflects the config. 2. The config data shouldn't be change often. It is configured by human users. 3. The config data per key should be about 10-20 key value pairs. 4. Ideally the key number is in the range of a few millions, but a few thousands to begin with. Thanks Eric On Thu, Oct 5, 2017 at 9:09 AM Lukasz Cwik <[email protected]> wrote: > Can you stream the updates to the keys into the pipeline and then use it > as a side input performing a join against on your main stream that needs > the config data? > You could also use an in memory cache that periodically refreshes keys > from the external source. > > A better answer depends on: > * how stale can the config data be? > * how often does the config data change? > * how much config data you expect to have per key? > * how many keys do you expect to have? > > > On Wed, Oct 4, 2017 at 5:41 PM, Yihua Fang <[email protected]> > wrote: > >> Hi, >> >> The use case is that I have an external source that store a configuration >> for each key accessible via restful APIs and Beam pipeline should use the >> config to process each element for each key. What is the best approach to >> facilitate injecting the latest config into the pipeline? >> >> Thanks >> Eric >> > >
