Re: dealing with metadata coming in a separate pubsub

Lukasz Cwik Fri, 21 Apr 2017 11:01:42 -0700

How do you know when a record in the data pipeline has enough meta
information stored so that it can be processed?
How far behind is the meta data pubsub compared to the main pubsub?
Do you expect late data/metadata, and if so what do you want to do?


Also, side inputs aren't meant to be slow and their performance has a lot
more to do with how runners implement them.
Depending on the windowing strategy, side inputs will block until at least
the first trigger firing, afterwards a runner will try to provide the
latest values but there are no strong guarantees.


On Fri, Apr 21, 2017 at 9:14 AM, sowmya balasubramanian <
[email protected]> wrote:

> Hi,
>
> I have 2 pubsubs - one that has data coming every 30 seconds and the
> second one that contains some meta information about the data points. The
> meta data pubsub is slower than the data pubsub.
>
> I have to use the metadata information to aggregate the input coming in
> the data pub-sub. I will need a good chunk of the metadata to arrive before
> I can do the aggregation so, currently, I have a separate pipeline that
> processes this metadata pubsub and stores it.
>
> The data pipeline then uses this stored information to perform the
> aggregation.
>
> What I am wondering is, can I get rid of the metadata pipeline completely
> and instead use slow side inputs? I am stuck trying to evaluate the pros
> and cons of this approach because of my limited knowledge about slow side
> inputs. So, what are some of the things I need to consider if I head down
> the path of slow side inputs?
>
> Thanks,
> Sowmya
>

Re: dealing with metadata coming in a separate pubsub

Reply via email to