Yes, everytime you start the pipeline you need to read in all the config data. You can do this by flattening a bounded source which reads the current state of config data with a streaming source that gets updates and then use that as a side input. On the other hand, with a few million keys, using an in memory cache with your own refresh policy that contacts your datastore would also likely work well.
On Thu, Oct 5, 2017 at 10:16 AM, Yihua Fang <[email protected]> wrote: > I thought about streaming the updates and using side input, but since the > config data are not persisted in the data pipeline, how would the config > being populated in the first place. Is the solution to populate through > streaming every time the pipeline is rebooted? > > 1. The config data can be a few minutes to a few hours late. No need to > immediately reflects the config. > 2. The config data shouldn't be change often. It is configured by human > users. > 3. The config data per key should be about 10-20 key value pairs. > 4. Ideally the key number is in the range of a few millions, but a few > thousands to begin with. > > Thanks > Eric > > On Thu, Oct 5, 2017 at 9:09 AM Lukasz Cwik <[email protected]> wrote: > >> Can you stream the updates to the keys into the pipeline and then use it >> as a side input performing a join against on your main stream that needs >> the config data? >> You could also use an in memory cache that periodically refreshes keys >> from the external source. >> >> A better answer depends on: >> * how stale can the config data be? >> * how often does the config data change? >> * how much config data you expect to have per key? >> * how many keys do you expect to have? >> >> >> On Wed, Oct 4, 2017 at 5:41 PM, Yihua Fang <[email protected]> >> wrote: >> >>> Hi, >>> >>> The use case is that I have an external source that store a >>> configuration for each key accessible via restful APIs and Beam pipeline >>> should use the config to process each element for each key. What is the >>> best approach to facilitate injecting the latest config into the pipeline? >>> >>> Thanks >>> Eric >>> >> >>
