Re: Strategies for Complex Event Processing with guaranteed data consistency

2017-01-18 Thread Fabian Hueske
Hi Kat, thanks for the clarification about cases and traces. Regarding the aggregation of traces: You can either do that in the same job that constructs the cases or in a job which is decoupled by for instance Kafka. If I got your requirements right, you need a mechanism for retraction. A case

Re: Strategies for Complex Event Processing with guaranteed data consistency

2017-01-16 Thread Kathleen Sharp
Hi Fabian, A case consists of all events sharing the same case id. This id is what we initially key the stream by. The order of these events is the trace. For example, caseid: case1, consisting of event1, event2, event3. Start time 11:00, end 11:05, run time 5 minutes caseid: case12, consisting

Re: Strategies for Complex Event Processing with guaranteed data consistency

2017-01-13 Thread Fabian Hueske
On thing to add: the Flink KafkaProducer provides only at-least-once if flush-on-checkpoint is enabled [1]. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.1/api/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducerBase.html#setFlushOnCheckpoint-boolean- 2017-01-13

Re: Strategies for Complex Event Processing with guaranteed data consistency

2017-01-13 Thread Fabian Hueske
Hi Kat, I did not understand the difference between a case and a trace. If I got it right, the goal of your first job is to assemble the individual events into cases. Is a case here the last event for a case-id or all events of a case-id? If a case is the collection of all events (which I assume)