Re: [akka-user] Stream deduping

2017-07-10 Thread Shiva Ramagopal
Hi, Thanks for the answers! Michal, Your approach seems most appropriate for my case as it dedups *and* handles late records. Your point on losing messages in the map upon restart after a failure, is very valid. One way of handling this is to have checkpoints at window-level. Roughly speaking,

Re: [akka-user] Stream deduping

2017-06-24 Thread 'Michal Borowiecki' via Akka User List
I drafted an implementation outline in kafka-streams to address the problem of sliding-window reordering (to cater for late messages within the time window), it also caters for de-duplication:

Re: [akka-user] Stream deduping

2017-06-23 Thread Anil Gursel
It is not latest and greatest; however, here is an Akka Streams GraphStage implementation for deduplication: https://squbs.readthedocs.io/en/latest/deduplicate/. All happens in memory, so you need to watch for memory growing and potentially pass a custom registry that self cleans after a while.

[akka-user] Stream deduping

2017-06-23 Thread Shiva Ramagopal
Hi, I'm looking for the latest and greatest techniques and thoughts in stream deduplication and would love to know if anyone here has done this at scale. Specifically, I'm looking for deduping that also handles late-arriving messages. In the past few days of my search, I've mostly come across