Restructuring with your tip now, Michael, thank you!
---
Oytun Tez
*M O T A W O R D*
The World's Fastest Human Translation Platform.
oy...@motaword.com — www.motaword.com
On Fri, Feb 22, 2019 at 11:23 AM Michael Latta
wrote:
> You may want to union the 3 streams prior to the process function if they
> are independently processed.
>
>
> Michael
>
> On Feb 22, 2019, at 9:15 AM, Oytun Tez wrote:
>
> Hi everyone!
>
> I've been struggling with an implementation problem in the last days,
> which I am almost sure caused by my misunderstanding of Flink.
>
> The purpose: consume multiple streams, update a score list (with meta data
> e.g. user_id) for each update coming from any of the streams. The new
> output list will also need to be used by another pattern.
>
>1. We created 3 SourceFunctions, that periodically go to our MySQL
>database and stream new results back. This one returns POJOs.
>2. Then we flatMap these streams to unify their Type. They are now all
>Tuple3s with matching types.
>3. And we process each stream with the same ProcessFunction.
>4. I am stuck with the output list.
>
> Business case (human translation workflow):
>
>1. Input: Stream "translation quality" score updates of each
>translator [translator_id, score]
>2. Input: Stream "responsivity score" updates of each translator
>(email open rates/speeds etc) [translator_id, score]
>3. Input: Stream "number of projects" updates each translator worked
>on [translator_id, score]
>4. Calculation: for each translator, use 3 scores to come up with a
>unified score and its percentile over all translators. This step definitely
>feels like a Batch job, but I am pushing to go with a streaming mindset.
>5. So now supposedly, in this way or another, I have a list of
>translators with their unified score and percentile over this list.
>6. Another independent stream should send me updates on "need for
>proofreaders" – I couldn't even come to this point yet. Once a need info is
>streamed, application would fetch the previously calculated list and let's
>say picks the top X determined by the message from need algorithm.
>
>
>
>
> Overall, my desire is to make everything a stream and let the data and
> decisions constantly react to stream updates. I am very confused at this
> point. Tried using keyed and operator states, but they seem to be keeping
> their state only for their own items. Considering to do Batch instead after
> all the struggle.
>
> Any ideas? I can even get on a call.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ---
> Oytun Tez
>
> *M O T A W O R D*
> The World's Fastest Human Translation Platform.
> oy...@motaword.com — www.motaword.com
>
>