On Wed, Aug 30, 2017 at 11:50 AM, Oleksandr Baliev <
aleksanderba...@gmail.com> wrote:

>
> So the main question is how to synchronize data reading between kafka
> partitions when data is sequential per partitions, but late for some of
> them and we care about that data is not thrown away and will be fully
> processed for some time range (window) later? :) It's more about manually
> handling consumption on Kafka Fetch level and FlinkKafka* is high level for
> that, isn't it?
>


At some point you have to give up on late data and drop it if you are
performing some window computation.  That said, that could be a long time,
allowing for very out of order data. Presumably most data won't be late,
and you want to output preliminary results to have timely data.  In that
case you want to implement a window trigger that fires early at regular
intervals without purging if it has received new events since the last time
it fired and purges the data once the allowed lateness time passes.

For instance, see this EventTimeTriggerWithEarlyAndLateFiring
<https://gist.github.com/eliaslevy/ec840444607b9a5dd5aa3eb2cdd77932> in
Java or this simplified EarlyFiringEventTimeTrigger
<https://gist.github.com/eliaslevy/43eca44e92fdef44e6717c60ea46e4d2> in
Scala.

Reply via email to