Hi Tejas,
I had your question sit in my mind for a while before I realized I had something to say about it 😊 Although not related to CEP, we had had a very similar problem with too many threads/tasks in an overwhelming split-join-pattern of about 1600 concurrent paths. A colleague of mine worked on this for his master's thesis [1] ... we came to the conclusion to - radically reduce fine grained parallelism, i.e. - use it (almost) only for Flink scale out (partitioning for specific key attributes) - transform our algorithms to run multiple cases sequentially instead of parallel - unify multiple key domains by generalizing the key and duplicating incoming events per key domain together with the unified key (ideas taken from e.g. [2]) - try to unify all state primitives that work on the same key into list or map state primitives, and iterate on these (this works especially well for RocksDB) - patch Flink task chaining to create longer chains and allow chaining for operator with multiple inputs (only mentioned in [1]) ... (= fewer tasks/threads) In your specific case with the CEP rules it is probably best - to implement the patterns yourself or integrate some external library, but - to make the CEP rules 'data' and broadcast them into the respective operators (ideally a single operator only), that iterates over the rules for each incoming event - for the stateless rules, I once integrated Spring Boot SpEL for a similar rules system (precompiled when initially loaded, rules are actually quite fast) - for the stateful rules - you could either integrate some proper library (which leaves you with the problem of integrating intermediate state into the Flink TypeInformation serialization system) - or implement it yourself e.g. by means of a regular expressions library that exposes its state transition tables generated for specific regular expressions This way you could use Flink for what it is excellent (low-latency high-throughput stream processing with consistent state over restarts/crashes (e.g.)) and optimize in areas that are not optimal for your use case. [1] https://www.merlin.uzh.ch/publication/show/21108 [2] https://www.youtube.com/watch?v=tHStmoj8WbQ [3] https://docs.spring.io/spring-integration/docs/5.3.0.RELEASE/reference/html/spel.html Feel free to get back to me for further discussion (on the user list) Thias On 2021/08/19 23:39:18, Tejas B <t...@gmail.com> wrote: > Hi,> > Here's our use case :> > We are planning to build a rule based engine on top of flink with huge number > of rules(1000s). the rules could be stateless or stateful. > > Example stateless rule is : A.id = 3 && A.name = 'abc' || A.color = red. > > Example stateful rule is : A is event.id =3, B is event.name = 'abc', C is > event.color = red and we are looking for pattern AB*C over time window of 1 > hour.> > > Now we have tried to use flink CEP for this purpose and program crashed > because of lot of threads. The explanation is : every application of > CEP.pattern creates a new operator in the graph and flink can't support that > many vertices in a graph.> > > Other approach could be to use processFunction in flink, but still to run the > rules on events stream you'd have to use some sort of CEP or write your own.> > > My question is, does anybody have any other suggestions on how to achieve > this ? Any other CEPs that integrate and work better with flink (siddhi, > jasper, drools) ? Any experience would be helpful.> > Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng verboten. This message is intended only for the named recipient and may contain confidential or privileged information. As the confidentiality of email communication cannot be guaranteed, we do not accept any responsibility for the confidentiality and the intactness of this message. If you have received it in error, please advise the sender by return e-mail and delete this message and any attachments. Any unauthorised use or dissemination of this information is strictly prohibited.