Hello,

Finally, even after creating my operator, I still get the error : "Timers can 
only be used on keyed operators".

Isn't there any way around this ? A way to "key" my stream without shuffling 
the data ?

From: Gwenhael Pasquiers
Sent: vendredi 10 novembre 2017 11:42
To: Gwenhael Pasquiers <gwenhael.pasqui...@ericsson.com>; 
'user@flink.apache.org' <user@flink.apache.org>
Subject: RE: Streaming : a way to "key by partition id" without redispatching 
data

Maybe you don't need to bother with that question.

I'm currently discovering AbstractStreamOperator, OneInputStreamOperator and 
Triggerable.

That should do it :-)

From: Gwenhael Pasquiers [mailto:gwenhael.pasqui...@ericsson.com]
Sent: jeudi 9 novembre 2017 18:00
To: 'user@flink.apache.org' 
<user@flink.apache.org<mailto:user@flink.apache.org>>
Subject: Streaming : a way to "key by partition id" without redispatching data

Hello,

(Flink 1.2.1)

For performances reasons I'm trying to reduce the volume of data of my stream 
as soon as possible by windowing/folding it for 15 minutes before continuing to 
the rest of the chain that contains keyBys and windows that will transfer data 
everywhere.

Because of the huge volume of data, I want to avoid "moving" the data between 
partitions as much as possible (not like a naïve KeyBy does). I wanted to 
create a custom ProcessFunction (using timer and state to fold data for X 
minutes) in order to fold my data over itself before keying the stream but even 
ProcessFunction needs a keyed stream...

Is there a specific "key" value that would ensure me that my data won't be 
moved to another taskmanager (that it's hashcode will match the partition it is 
already in) ? I thought about the subtask id but I doubt I'd be that lucky :-)

Suggestions

·         Wouldn't it be useful to be able to do a "partitionnedKeyBy" that 
would not move data between nodes, for windowing operations that can be 
parallelized.

o   Something like kafka => partitionnedKeyBy(0) => first folding => keyBy(0) 
=> second folding => ....

·         Finally, aren't all streams keyed ? Even if they're keyed by a 
totally arbitrary partition id until the user chooses its own key, shouldn't we 
be able to do a window (not windowAll) or process over any normal Stream's 
partition ?

B.R.

Gwenhaël PASQUIERS

Reply via email to