Hi Tejas,


I had your question sit in my mind for a while before I realized I had 
something to say about it 😊





Although not related to CEP, we had had a very similar problem with too many 
threads/tasks in an overwhelming split-join-pattern of about 1600 concurrent 
paths.



A colleague of mine worked on this for his master's thesis [1] ... we came to 
the conclusion to

- radically reduce fine grained parallelism,  i.e.

- use it (almost) only for Flink scale out (partitioning for specific key 
attributes)

- transform our algorithms to run multiple cases sequentially instead of 
parallel

- unify multiple key domains by generalizing the key and duplicating incoming 
events per key domain together with the unified key (ideas taken from e.g. [2])

- try to unify all state primitives that work on the same key into list or map 
state primitives, and iterate on these (this works especially well for RocksDB)

- patch Flink task chaining to create longer chains and allow chaining for 
operator with multiple inputs (only mentioned in [1]) ... (= fewer 
tasks/threads)



In your specific case with the CEP rules it is probably best

- to implement the patterns yourself or integrate some external library, but

- to make the CEP rules 'data' and broadcast them into the respective operators 
(ideally a single operator only), that iterates over the rules for each 
incoming event

- for the stateless rules, I once integrated Spring Boot SpEL for a similar 
rules system (precompiled when initially loaded, rules are actually quite fast)

- for the stateful rules

  - you could either integrate some proper library (which leaves you with the 
problem of integrating intermediate state into the Flink TypeInformation 
serialization system)

  - or implement it yourself e.g. by means of a regular expressions library 
that exposes its state transition tables generated for specific regular 
expressions



This way you could use Flink for what it is excellent (low-latency 
high-throughput stream processing with consistent state over restarts/crashes 
(e.g.)) and optimize in areas that are not optimal for your use case.





[1] https://www.merlin.uzh.ch/publication/show/21108

[2] https://www.youtube.com/watch?v=tHStmoj8WbQ

[3] 
https://docs.spring.io/spring-integration/docs/5.3.0.RELEASE/reference/html/spel.html



Feel free to get back to me for further discussion (on the user list)



Thias







On 2021/08/19 23:39:18, Tejas B <t...@gmail.com> wrote:

> Hi,>

> Here's our use case :>

> We are planning to build a rule based engine on top of flink with huge number 
> of rules(1000s). the rules could be stateless or stateful. >

> Example stateless rule is : A.id = 3 && A.name = 'abc' || A.color = red. >

> Example stateful rule is : A is event.id =3, B is event.name = 'abc', C is 
> event.color = red and we are looking for pattern AB*C over time window of 1 
> hour.>

>

> Now we have tried to use flink CEP for this purpose and program crashed 
> because of lot of threads. The explanation is : every application of 
> CEP.pattern creates a new operator in the graph and flink can't support that 
> many vertices in a graph.>

>

> Other approach could be to use processFunction in flink, but still to run the 
> rules on events stream you'd have to use some sort of CEP or write your own.>

>

> My question is, does anybody have any other suggestions on how to achieve 
> this ? Any other CEPs that integrate and work better with flink (siddhi, 
> jasper, drools) ? Any experience would be helpful.>

>
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.

Reply via email to