you should be able to solve this kind of problem with Flink's CEP library.
The important thing here is to define a pattern interval length so that
patterns can time out. Otherwise, you will end up accumulating state which
is never purged. This will eventually cause an OOM exception.
How complex would a pattern be (how many stages, what kind of payload)?
Depending on this, we should be able to estimate the resource requirements.
Or you give it a try and see to how many machines you can minimize the
Great to hear that you enjoyed the conference :-)
On Thu, Sep 15, 2016 at 6:13 PM, David Koch <ogd...@googlemail.com> wrote:
> Is FlinkCEP applicable to large key spaces with potentially long timeouts
> between events that define a pattern? Ideally, without ridiculous hardware.
> More concretely, we segment users (one key per user) based on sequences of
> events for that user.
> A segment "Abandoned Cart" may be defined by adding items during a
> browsing session but no purchase event within the following 5 days. The
> number of users is between 1 and 10 million.
> Is this type of segmentation scenario a viable use case for FlinkCEP?
> We currently segment by building incremental profiles in ES which are then
> "matched against segment definition queries" using ES percolators. In
> short, we incur costs when interacting with ES.
> PS: Thanks for FlinkForward 2016, very interesting presentations and
> equally important excellent catering ;-)