Hi,

We've been facing issues* w.r.t watermarks not supported per key, which led
us to:

Either (a) run the job in Processing time for a KeyedStream -> compromising
on use cases which revolve around catching time-based patterns
or (b) run the job in Event time for multiple data streams (one data stream
per key) -> this is not scalable as the number of operators grow linearly
with the number of keys

To address this, we've done a quick (poc) change in the
AbstractKeyedCEPPatternOperator to allow for the NFAs to progress based on
timestamps extracted from the events arriving into the operator (and not
from the watermarks). We've tested it against our usecase and are seeing a
significant improvement in memory usage without compromising on the
watermark functionality.

It'll be really helpful if someone from the cep dev group can take a look
at this branch - https://github.com/jainshailesh/flink/commits/cep_changes
and provide comments on the approach taken, and maybe guide us on the next
steps for taking it forward.

Thanks,
Shailesh

* Links to previous email threads related to the same issue:
http://apache-flink-user-mailing-list-archive.2336050.
n4.nabble.com/Question-on-event-time-functionality-
using-Flink-in-a-IoT-usecase-td18653.html
http://apache-flink-user-mailing-list-archive.2336050.
n4.nabble.com/Generate-watermarks-per-key-in-a-KeyedStream-td16629.html
http://apache-flink-user-mailing-list-archive.2336050.
n4.nabble.com/Correlation-between-number-of-operators-
and-Job-manager-memory-requirements-td18384.html

Reply via email to