Hi, Jose

What I understand your question is
    Your job has two stages. You want to handle the first stage differently
according to the event time of the Stream A. It means that if the event
time of Stream A is “too late” then you would enrich Stream A with the
external system and or you would enrich Stream A with the Stream B. But the
second stage is the same. You want to know the best practice for this
situation.

    I don’t know whether there is a best practice. :-) (Maybe other guys
know) But Personally, I prefer using one graph to handle this. From a cost
perspective the second stage could be reused(both logical and runtime
resource).
Best,
Guowei


On Mon, Jan 25, 2021 at 4:18 AM Jose Velasco <jose.velasco.h...@gmail.com>
wrote:

> Hi everybody!
>
> I'm so excited to be here asking a first question about Flink DataStream
> API.
>
> I have a basic enrichment pipeline (event time). Basically, there's a main
> stream A (Kafka source) being enriched with the info of 2 other streams: B
> and C. (Kafka sources as well).
> Basically, the enrichment graph consists in 2 stages:
>
> 1. The stream A is enriched with stream B, resulting in stream (A, B)
> 2. The stream (A, B) is enriched with C, resulting in a stream (A, B, C).
>
>
> I've created a side output for late events A. The flow for these events
> would be slightly different:
>
> 1. The stream A is enriched by fetching the info from an external service
> using Flin Async I/O, resulting in stream (A, B)
> 2. Then, the stream (A, B) is enriched with C, resulting in a stream (A,
> B, C) (same before)
>
>
> Note that late A events can arrive out of order (weeks in same cases)
>
>
> I was wondering where the late-events flow should be defined. One
> graph/job for both flows or 2 graph/jobs with different times? What's the
> general pattern for cases like this?
>
>
>
> Many thanks, Jose Velasco
>

Reply via email to